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FINAL  TECHNICAL  REPORT 
1  DEC.  1988  -  31  MAY  1992 


Introduction 

The  goal  of  this  three-year  project  was  to  better  understand  the  human  listener's 
ability  to  process  important  classes  of  complex  sounds.  In  particular,  we  adopted  a  modu¬ 
lation-demodulation  (mo-dem)  model  of  auditory  perception.  That  is,  we  took  the  position 
that  useful  information  in  the  sound  stream  reaching  the  human  listener  may  be  character¬ 
ized  as  modulations  of  signal  amplitude  and  angle.  Angle  modulation  can  be  expressed 
as  either  phase  or  frequency  modulation.  The  listener’s  task  then,  in  extracting  informa¬ 
tion  from  the  sound  stream,  may  be  characterized  as  a  demodulation  process.  For  this 
project,  we  have  focused  primarily  on  the  processing  of  frequency-modulated  (FM)  sig¬ 
nals. 


Work  conducted  by  the  PI  and  several  of  his  students  and  colleagues,  prior  to  the 
initiation  of  this  project,  had  led  to  the  formulation  of  the  EWAIF  model  for  complex  sound 
discrimination.  EWAIF  is  an  acronym  for  envelope- weighted  average  of  instantaneous 
frequency.  We  had  found  that  listener  performance  could  often  be  predicted  by  calculat¬ 
ing  the  EWAIF  values  for  complex  signal  pairs  that  were  discriminable  in  forced-choice 
paradigms.  We  began  the  series  of  studies  with  common-envelope  signal  pairs  as  de¬ 
fined  by  Voeicker  in  his  work  on  a  Unified  Theory  of  Modulation.  Our  first  attempt  to  relax 
the  common  envelope  restriction  appeared  to  be  quite  successful.  We  applied  the  EWAIF 
model  to  the  array  of  sinusoids  used  in  early  profile  analysis  studies,  and  offered  an  alter¬ 
native  explanation  for  the  apparent  enhanced  sensitivity  to  amplitude  increments  embed¬ 
ded  in  profile  arrays  of  greater  signal  densities.  That  is,  Green  and  his  associates  re¬ 
ported  that  listeners  could  detect  an  ever  smaller  increment  to  the  center  component  of 
the  profile  array  as  more  sinusoids  were  added  to  the  band  occupied  by  the  array.  We 
calculated  EWAIF  values  for  flat  versus  incremented  arrays  and  showed  that  a  reliable 
pitch  shift  was  produced  by  the  increment.  The  shift  is  dependent  on  the  relative  ampli¬ 
tudes  of  the  tones  not  the  absolute  levels,  thus  roving-level  manipulations  were  ineffective 
in  rendering  the  cue  unusable.  Further,  as  tone  density  increased  and  just-detectable 
amplitude  increments  decreased,  the  difference  in  EWAIF  values  remained  approximately 
constant.  We  thus  concluded  that  the  enhancement  in  profile  performance  with  increased 
tone  density  was  probably  related  to  the  pitch  cue  rather  than  to  profile  analysis,  per  se. 
This  work  was  reported  at  the  Complex  Sound  Workshop  sponsored  by  AFOSR  at 
Sarasota,  FL  in  1986. 

Recent  work  by  Versfeld  and  Houtsma,  however,  has  indicated  that  the  common- 
envelope  restriction  cannot  be  disregarded  with  most  narrow  band-  width  signals.  In  two 
reports  they  have  demonstrated  the  inadequacy  of  EWAIF  predictions  for  complex  signal 
pair  discriminations.  We  have  replicated  some  of  their  results  and  are  currently  conduct¬ 
ing  an  investigation  of  modifications  required  to  use  a  demodulation  model  for  these  re¬ 
sults.  Essentially,  when  the  signal  envelopes  differ,  as  they  do  in  Versfeld  and  Houtsma’s 
signals,  the  listener  is  likely  to  use  both  amplitude  and  frequency  information  to  distin¬ 
guish  between  the  signals.  Anantharaman,  Krishnamurthy  and  the  PI  are  currently  work- 
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ing  on  a  multi-channel  version  of  the  EWAIF  model  that  will  incorporate  envelope  cues  as 
well.  (Preliminary  results  are  shown  in  Figure  1 .  appended  to  this  report.) 

List  of  research  objectives  and  cumulative  progress 

1 .  Step  vs.  Glide  discrimination 

Two  manuscripts  for  publication  have  resulted  from  work  on  this  phase  of  the  pro¬ 
posal.  John  Madden’s  dissertation,  which  extended  the  original  step  vs.  glide  discrimina¬ 
tion  task  to  listeners  with  sensori-neural  hearing  loss,  appeared  in  the  April  1992  issue  of 
Journal  of  Speech  and  Hearing  Research.  The  manu-  script  detailing  our  earlier  work 
with  normal-hearing  listeners,  was  submitted  to  Journal  of  the  Acoustical  Society  of 
America  in  April  1992.  It  is  currently  undergoing  some  minor  revisions  requested  by  the 
reviewers.  We  anticipate  that  it  will  appear  in  the  journal  by  late  1992  or  early  1993. 
Details  of  this  work  can  be  found  in  copies  of  these  manuscripts  which  are  appended  to 
this  report. 

2.  Multi-channel  EWAIF  model 

Work  on  extending  the  original  EWAIF  model  to  incorporate  an  approximation  of 
the  peripheral  filtering  of  the  human  auditory  system  has  been  especially  fruitful.  Jayanth 
Anantharaman,  a  graduate  student  in  Electrical  Engineering  under  the  supervision  of  Co- 
Investigator  Ashok  Krishnamurthy,  tackled  this  phase  of  the  project  as  his  masters  thesis 
project.  Anantharaman  first  devised  a  computationally  more  efficient  form  of  the  original 
EWAIF  model,  which  we  have  dubbed  the  IWAIF  model.  That  is.  Intensity-weighted 
average  of  instantaneous  frequency.  The  model  output  can  be  computed  very  efficiently 
in  the  frequency  domain  using  the  FFT,  rather  than  the  time  domain.  Further,  the  result  of 
the  frequency  domain  calculation  has  an  appealing  interpretation  in  terms  of  the  center- 
of-gravity  of  the  sound  spectrum.  A  convention  presentation  of  the  initial  work  was  fol¬ 
lowed  by  a  manuscript  submitted  to  Journal  of  the  Acoustical  Society  of  America  in 
February  1992.  A  transition  of  Associate  Editors  and  the  apparent  overlap  of  content  in  a 
similar  manuscript  by  Dai  from  Green’s  group  at  Florida  has  led  to  a  delay  in  the  process¬ 
ing  of  the  manuscript.  Revisions  are  undenway  for  a  submission  of  the  revised  manuscript 
by  the  end  of  August.  A  copy  of  the  Anantharaman,  et  al.  manuscript  is  appended  to  this 
report.  Work  on  the  extension  to  a  multi-channel  model  continues  as  Anantharaman  has 
begun  to  work  on  his  PhD  project. 

3.  Single-step  vs.  glide  experiments 

This  series  of  experiments  was  begun  in  collaboration  with  R.  Gerren  at  Kansas 
University  in  1987-88.  Data  collection  was  completed  before  the  PI  moved  from  Kansas 
to  Ohio  State.  Gerren  visited  Ohio  State  in  No  1989  as  a  consultant  on  the  current  project, 
but  a  manuscript  for  publication  has  not  been  forthcoming.  Lack  of  support  for  his  work  at 
Kansas  has  distracted  Gerren  from  completing  the  work.  With  some  help,  the  PI  will  pro¬ 
duce  the  final  draft  of  the  paper  for  submission. 


4.  FM  transitions  with  amplitude  contours 
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The  preliminary  work  on  this  portion  of  the  project,  initiated  by  Y.  Y.  Qi  were  pre¬ 
sented  at  the  Nov.  1990  meeting  of  the  Acoustical  Society  in  San  Diego.  No  further  work 
was  conducted  on  this  research  line  because  work  on  other  aspects  of  the  project  re¬ 
quired  more  time  than  originally  projected.  These  preliminary  results  will  be  of  value  in 
the  implementation  of  a  multi-channel  IWAIF  model  for  speech-like  complex  sounds. 


5.  Glide  direction  and  slope  discrimination 

The  completion  of  experiments  described  under  this  phase  of  the  proposal  re¬ 
quired  the  development  of  software  for  the  “real-time”  generation  of  frequency-modulated 
tones.  Further  work  was  required  to  conduct  roving-frequency  discrimination  experiments 
running  three  listeners  in  independent  adaptive-tracking  tasks.  Chien  Yeh  Hsu  devel¬ 
oped  the  required  software  so  that  we  could  conduct  these  experiments  efficiently.  A 
manuscript  describing  the  software  was  submitted  to  Behavior  Research  Methods, 
Computers  and  Instrumentation  in  April  1992.  We  are  awaiting  an  editorial  decision. 

The  initial  experimental  work  on  ♦his  phase  of  the  project  served  as  the  masters 
thesis  for  Mary  Neill.  Her  work  was  conducted  prior  to  the  completion  of  the  adaptive 
tracking  portion  of  the  software  described  above.  Preliminary  reports  of  the  work  were 
presented  at  both  ARO  and  Acoustical  Society  meetings.  Ms.  Neill  has  elected  not  to 
continue  in  the  graduate  program  at  this  time,  and  work  on  preparation  of  a  manuscript  for 
publication  has  been  retarded.  A  copy  of  the  thesis  work  is  appended  to  this  report.  We 
anticipate  that  a  publishable  manuscript  will  be  finished  in  fall  1992. 

Considerable  time  was  spent  in  developing  an  adaptive  tracking  procedure  for  the 
determination  of  just-discriminable  frequency-glide  slope.  Prior  work  had  required  blocks 
of  trials  at  fixed  slope  differences  to  establish  full  psychometric  functions.  Once  we  de¬ 
termined  that  these  psychometric  functions  were  monotonic  and  reasonably  well-be¬ 
haved,  we  moved  to  an  adaptive  testing  paradigm.  Our  initial  results  from  the  adaptive 
testing  led  us  on  a  long  chase  for  possible  procedural  or  equipment  artifacts,  because  of 
an  apparent  hysteresis  in  the  FM  glide  slope  discrimination  thresholds.  When  the  target 
slope  approached  the  standard  slope  from  “above”  (i.e.,  the  target  was  steeper)  the  adap¬ 
tive  routine  settled  into  a  threshold  slope  difference  that  could  be  as  much  as  ten  times 
larger  than  when  the  routine  approached  from  “below”  (i.e.,  the  target  was  flatter). 
Introducing  roving  starting  frequency  conditions  further  complicated  the  results.  (See  Fig¬ 
ure  2  appended  to  this  report.) 

We  now  are  confident  that  these  unexpected  results  are  not  simply  due  to  artifact  in 
the  experimental  procedures.  Similar  hysteresis  has  been  reported  by  Porter,  Cullen 
Collins  and  Jackson  [J.  Acoust.  Soc.  Amer.  90, 1298-1308, 1991].  Porter  et  al.,  were  in¬ 
vestigating  formant  transition  onset  frequencies.  While  these  hysteresis  effects  appear  to 
be  “real”,  we  have  not  formulated  a  reasonable  explanation  for  their  occurrence. 

Work  on  this  phase  of  the  project  continues  with  the  PhD  project  of  Hsu,  who  is  de¬ 
veloping  a  short-term  running  version  of  the  IWAIF  model  to  predict  performance  in  FM 
glide  slope  and  direction  discrimination.  The  model  will  incorporate  the  “front-end”  of  the 
Patterson-Holdsworth  “Auditory  Sensation  Processor”  model.  The  series  of  “source-filter” 
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discrimination  experiments  described  in  the  proposal  will  be  conducted  as  part  of  Hsu's 
dissertation  work. 

6.  Additional  experiments 

Several  experiments  not  described  in  the  1988  proposal  have  been  conducted  as 
part  of  this  project.  One  has  been  mentioned  in  section  1  above.  John  Madden’s  disser¬ 
tation  project  extending  the  glide  vs.  step  discrimination  task  to  listeners  with  sensori-neu- 
rai  hearing  loss  was  not  anticipated  in  the  proposal.  In  addition  to  being  of  value  for  un¬ 
derstanding  the  deleterious  effects  of  hearing  loss  on  the  ability  of  human  listeners  to  pro¬ 
cess  complex  sounds,  the  project  allows  us  to  probe  a  bit  further  into  possible  physiologi¬ 
cal  mechanisms  underlying  this  ability.  We  now  are  formulating  plans  to  extend  the 
paradigm  to  persons  implanted  with  a  multi-channel  cochlear  implant.  This  work  will 
likely  be  the  basis  for  a  dissertation  project  by  Ina  Bicknell.  In  addition  to  the  information 
gained  on  the  signal  processing  abilities  of  implant  wearers,  this  project  may  allow  a  di¬ 
rect  test  of  our  notion  that  neural  synchrony  is  essential  for  optimum  performance  on  this 
task.  Since  we  can  drive  auditory  nerves  directly  through  the  implant  processor,  we 
should  gain  some  insight  into  the  underlying  physiology  for  this  task. 

Another  project  not  anticipated  in  the  original  proposal  was  the  study  of  dichotic  vs. 
diotic  profile  analysis  conducted  by  Gail  Wightlaw  for  her  dissertation  project.  The  PI  has 
asserted  that  some  part  of  the  profile  analysis  processing  was  due  to  the  frequency-modu¬ 
lation  artifact  generated  when  one  tone  of  a  multi-tone  profile  array  was  increased  in 
level.  Since  such  FM  artifacts  would  be  difficult  to  demonstrate  in  true  dichotic  stimuli,  we 
designed  a  test  of  dichotic  vs.  diotic  profile  analysis.  Little  was  reported  in  the  profile  lit¬ 
erature  on  the  possibility  of  profile  analysis  in  dichotic  listening.  What  was  available 
seemed  contradictory,  with  Green’s  associates  claiming  little  support  for  dichotic  profile 
analysis  capabilities  in  their  listeners,  but  Fantini,  et  al  reporting  reduced  but  substantial 
dichotic  profile  analysis  results.  Whitelaw’s  work  supports  that  of  Fantini  et  al.  A  manu¬ 
script  for  submission  to  Journal  of  the  Acoustical  Society  of  America  is  nearly  complete. 

(A  copy  of  the  draft  is  appended  to  this  report.) 

Finally,  the  entree  into  dichotic  signals  led  us  to  the  reports  by  Clifton  and  her  as¬ 
sociates  on  the  dynamic  effects  of  prior  stimulation  on  echo  suppression.  The  published 
work  was  always  reported  for  sound  field  listening  conditions.  Since  the  precedence  ef¬ 
fect  and  other  demonstrations  of  echo  suppression  can  be  demonstrated  under  head¬ 
phones,  we  questioned  the  apparent  lack  of  headphone  listening  data.  Pat  Burton  chose 
to  follow  this  question  for  her  masters  thesis  work.  The  thesis  is  nearly  complete.  We  an¬ 
ticipate  a  defense  by  Sept.  1992.  A  copy  of  the  completed  work  will  be  forwarded  at  that 
time. 
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No  patentable  inventions  have  resulted  from  this  research 
General  Statements 

Often  the  impact  of  a  research  project  is  assessed  solely  by  the  number  of  publica¬ 
tions  it  has  produced.  This  project  might  be  judged  to  have  been  of  little  impact  if  num¬ 
bers  of  papers  published  (to  date)  ware  the  only  criteria.  The  PI  notes  here  that  he  has 
been  lax  in  getting  the  results  of  this  work  submitted  to  the  journals  as  promptly  as  he 
should  have.  Several  more  manuscripts  derived  from  these  three  years  of  support  will  be 
completed  and  submitted  over  the  next  several  months.  The  work  supported  by  this  grant 
has  had  an  impact  on  the  field.  We  note  here  the  work  by  Richards,  Onsan  and  Green, 
“Auditory  profile  analysis:  Potential  pitch  cues",  [Hearing  Research,  39,  27-36,  (1989)]; 
Kidd,  Mason,  Uchanski,  Brantley  and  Shah,  “Evaluation  of  simple  models  of  profile  anal¬ 
ysis  using  random  reference  spectra",  [J.  Acoust  Soc.  Amer.,  90, 1340-1354,  (1991)];  and 
Versfeld  and  Houtsma,  “perception  of  Spectral  Changes  in  Multi-tone  Complexes”,  [The 
Quarterly  J.  of  Experimental  Psychology,  43A,  459-479,  (1 991 )].  Modifications  of  the 
EWAIF  model  have  been  reported  by  B.  Berg  and  H.  Dai  (currently,  or  formerly,  of  Green's 
research  group  at  Florida). 
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Common  Envelope  Signals 


I.  Introduction 

Voelker’s  (1966  a,b)  basis  sig¬ 
nal  pair  A  -)-  A/l./j)  and  (y4  -f  AA.fi,  A,/^)  arc 

know  to  have  tlie  same  envelope.  How  do  we  extend  this 
idea  of  common  envelope  pairs  to  multi-component  sig¬ 
nals'’  Two  possibilities  are  discussed  below.  Employing 
these  signals  as  auditory-stimulii  in  actual  experiments  is 
as  yet  unclear. 

U.  Type  I 

We  claim  that  the  following  signal  pair  have  a  common 
envelope.  This  pair,  si(/)  and  S2(<),  is  derived  from  the 
basic  pair  by  duplication  at  frequencies  ui,  from  them. 

«l(0  =  XI  COs(wa  -I- -f 

m 

fc  COS(W6  -h  Wm)<  (1) 

*2(0  =  COs(uJa+U)m)t  + 

m 

a  C0S(W4  (2) 

Let  us  calculate  the  envelope  of  signal  Si(<).  The  corre¬ 
sponding  analytic  signal  rni(t)  is 

mi(/)  =  XI  **  cosl^o  +  -f 

m 

6  COs(wt  -t-  + 

j  a  sin(tJa  w„,)t  + 


j  b  sin(w6  -1-  wm)t 

(3) 

XI  a  expj(Wa  -t-  W„)<  + 

b  exp  j(wt  ■+  ui„)i 

(4) 

=  (aexp  jwa/ -t- 6exp>wi,0  X^exp  (■'>) 

m 

The  envelope,  fj(/)  is  given  by 

e,(t)  =  irn;(l)l  (6) 

=  |aex|i  -f- 6e.xp;w(,/|  expj^mt  (~) 

rn 

Siiiiiiarly,  the  envelope  of  signal  .■>2(0 

=  |iexpj-*.at -f  aexp  jwJt/l  exp  jwn.t  (W) 


As  in  the  case  of  the  basic  Voelker  pair, 

|aexp  juiat  ■+ 6exp  =  |6expjWat -|- aexp 

=  -i- 2ab cos(uii  —  u)a)t  +  b^  (9) 

Therefore  signals  si(t)  and  S2(<)  have  the  same  envelope. 
We  shall  further  investigate  the  inst2Uitaneous  frequency. 


III.  Type  II 

The  following  pair  of  signals  are  spectral  ramps.  The 
frequencies  of  the  components  vary  linearly  while  their 
respective  amplitudes  vary  linearly  on  a  log  scale. 

N 

*l(0  =  Aor"*  cos(wt  -h  mu;o)t  (10) 

m=;0 

s 

S2(t)  =  cos(u;e  -f  TTLJo)t  (11) 

msO 

The  analytic  signal  corresponding  to  si(t)  is 


N 

”*'(0  =  yi  Aa"‘exp ji(wt  +  mwo)t  (12) 

msO 

N 

=  Aexpjwcf  yi(Qexpjwot)'"  (13) 

m  =  0 

The  envelope  is  then 

ei(0  =  l»’»i(0l  (14) 

s 

=  A  XI  (a  exp  j  Wo  O'"  (15) 

msO 

Similarly  the  envelope  of  S2(0  is 

62(0  =  ^  y^  o^~'"exp  (16) 

m  =  0 


Putting  n  =  A'  —  m  gives  us 

N 

P2(0  =  ^  XI  (1”) 

n  =  0 

It  IS  easily  seen  that  ei(<)  =  e2(0  Hence  the  pair  de¬ 
scribed  in  (lO)-(ll)  specifies  another  pair  of  signals  hav¬ 
ing  the  same  envelope  functions 


Amplitude(dB  -  log  scale) 
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Temporal  Resolution  in  Normal^ 
Hearing  and  Hearing-Impaired 
Listeners  Using  Frequency- 
Modulated  Stimuli 
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This  study  compares  the  temporal  resolution  of  frequency-modulated  sinusoids  by  normal¬ 
hearing  and  hearing-impaired  subjects  in  a  discrimination  task.  One  signal  increased  linearly  by 
200  Hz  in  50  msec.  The  other  was  identical  except  that  its  trajectory  followed  a  series  of  discrete 
steps.  Center  frequencies  were  500,  1000,  2000,  and  4000  Hz.  As  the  number  of  steps  was 
increased,  the  duration  of  the  individual  steps  decreased,  and  the  subjects'  discrimination 
performance  monotonically  decreased  to  chance.  It  was  hypothesized  that  the  listeners  could 
not  temporally  resolve  the  trajectory  of  the  step  signals  at  short  step  durations.  At  equal 
sensation  levels,  and  at  equal  sound  pressure  levels,  temporal  resolution  was  significantly 
reduced  for  the  impaired  subjects.  The  difference  between  groups  was  smaller  in  the  equal 
sound  pressure  level  condition.  Performance  was  much  poorer  at  4000  Hz  than  at  the  other  test 
frequencies  in  all  conditions  because  of  poorer  frequency  discrimination  at  that  frequency. 

KEY  WORDS:  temporal  resolution,  frequency  modulation,  sensorineural  hearing  loss 


Studies  of  auditory  temporal  resolution  measure  the  ability  of  the  auditory  system 
to  resolve  temporal  changes  in  an  acoustic  signal.  A  number  of  approaches  have 
been  employed  over  the  last  several  decades  to  investigate  the  temporal  resolution  of 
the  normal  auditory  system,  among  them  temporal  modulation  transfer  studies 
(Viemeister,  1979),  fonward-masking  studies  (Nelson  &  Freyman,  1987;  Plomp, 
1964),  and  gap  detection  studies  (Fitzgibbons,  1983;  Shailer  &  Moore,  1985,  1987). 
For  the  most  part,  the  experimental  task  of  choice  has  been  gap  detection,  probably 
because  of  its  relative  convenience.  Several  issues  have  been  raised  in  this  research 
that  have  not  been  completely  resolved.  One  issue  is  the  effect  of  hearing  impairment. 
Some  studies  have  found  significantly  poorer  gap-detection  thresholds  in  hearing- 
impaired  listeners  in  comparison  with  normal-hearing  listeners  (e.g.,  Fitzgibbons  & 
Wightman,  1982),  but  others  have  not  (Florentine  &  Buus,  1984).  Another  issue  is  the 
effect  of  frequency  on  gap-detection  performance.  In  gap-detection  studies  using 
narrow-band  noise  stimuli,  a  decrease  in  gap-detection  threshold  with  increase  in 
frequency  has  been  observed  in  both  normal  and  impaired  ears  (Fitzgibbons  & 
Gordon-Salant,  1987)  Fitzgibbons  argues  that  the  faster  response  of  the  more 
broadly-tuned  high-frequency  auditory  filters  accounts  for  the  increase  in  temporal 
acuity  at  the  higher  frequencies.  However,  in  a  study  by  Moore,  Glasberg,  Donaldson, 
McPherson,  and  Plack  (1989)  using  sinusoidal  stimuli,  gap-detection  thresholds  did 
not  change  significantly  between  400  and  2000  Hz,  and  the  results  of  Formby  and 


•Currently  affiliated  with  the  Speecti  and  Heanng  Department,  Cleveland  Slate  Universily,  Cleveland.  Ohio 


AnH  TK.in  S|X‘ct  f\  I A 


K  l.lfHHI 


436 


002:-468Si^2,/^S02  CH^^$01.0C/C 


Madden  &  Feth:  Tcriilhtrul  Hcsiilutum  Ustnn  Fretfuency-ModuLiwd  ScimuU  437 


Forest  (1991)  indicate  that  gap-detection  thresholds  are 
independent  of  frequency  between  500  and  4(X)0  Hz. 

Gap-detection  tasks,  as  well  as  forward-masking  and 
temporal  modulation  transfer  studies,  involve  the  resolution 
of  amplitude  changes  in  experimental  stimuli.  The  literature 
on  temporal  resolution  is  dominated  by  such  studies.  There 
are  a  few  exceptions  to  this  generalization.  For  example. 
Jesteadt,  Bilger,  Green,  and  Patterson  (1976)  compared  the 
temporal  resolution  of  the  normal  and  impaired  ears  of 
listeners  with  unilateral  hearing  losses  using  Huffman  se¬ 
quences  as  stimuli  (Huffman,  1962).  The  results  indicated 
that  temporal  acuity  was  better  lor  the  ear  showing  the  poorer 
hearing  in  8  out  of  the  10  subjects  tested.  Studies  of  the 
detection  of  frequency  modulation  (FM)  in  sinusoids  have 
implications  for  temporal  acuity  (Kay.  1982,  presents  a 
review  of  this  research).  However,  direct  investigations  of 
temporal  resolution  using  FM  stimuli  are  relatively  rare, 
despite  the  ubiquity  of  FM  in  naturally  occurring  sounds  such 
as  speech.  Studies  of  temporal  acuity  in  hearing-impaired 
listeners  using  FM  stimuli  are  nearly  nonexistent. 

Feth,  Neill,  and  Knshnamurthy  (1989)  recently  investigated 
nonnal  temporal  resolution  with  a  new  disenmination  task  in 
which  FM  stimuli  were  used.  Subjects  were  asked  to  discrimi¬ 
nate  between  two  sinusoidal  signals.  One  signal,  the  glide, 
made  a  transition  from  a  lower  frequency  to  a  higher  frequency 
over  a  smooth,  linear  path.  The  other  signal,  called  the  step 
signal,  began  and  ended  at  the  same  frequencies  as  the  glide, 
but  its  trajectory  followed  a  senes  of  discrete  steps.  That  is,  the 
signal  remained  at  one  frequency  for  a  brief  time  before 
abruptly  jumping  to  the  next  frequency.  Nomial-hearing  listen¬ 
ers  were  able  to  distinguish  step  from  glide  signals  easily  when 
the  number  of  steps  was  small,  but  as  the  number  of  steps 
increased,  disenmination  performance  monofonically  de¬ 
creased  to  chance.  It  was  assumed  that  at  this  point  the  limits 
of  the  listener's  ability  to  resolve  the  discontinuous  trajectory  of 
the  step  signal  had  been  reached,  and  it  was  indistinguishable 
from  the  glide  signal.  From  the  performance  of  the  subject  on 
this  task  It  was  possible,  therefore,  to  make  inferences  about 
the  listener  s  temporal  resolution  capacity. 

Results  obtained  by  Feth  et  al.  (1989)  indicated  a  temporal 
resolution  threshold  of  about  6  to  1 0  msec  for  normal-hearing 
listeners  at  center  frequencies  from  250  to  2000  Hz,  a  range 
that  is  comparable  to  estimates  of  temporal  resolution  found 
in  gap-detection  studies  (eg.,  Fitzgibbons  &  Wightman, 
1982:  Glasberg,  Moore  &  Bacon,  1987).  However,  resolution 
at  4000  Hz  was  much  poorer,  in  the  15-20-msec  range. 
Frequency  transitions  ranged  from  1 00  to  400  Hz,  and  signal 
durations  from  25  to  1 00  msec  were  used. 

The  major  purpose  of  the  present  study  was  to  compare 
temporal  resolution  in  listeners  with  moderate  hearing  losses 
of  presumed  sensorineural  origin  with  that  of  normal-hearing 
listeners,  using  FM  signals.  A  second  goal  was  to  investigate 
the  effect  of  frequency  on  the  resolution  of  FM  signals. 

Method  _ 

Subjects 

Five  hearing-impaired  and  5  normal-hearing  listeners  par¬ 
ticipated  in  the  study.  The  normal-hearing  subjects  ranged  in 


age  from  20  to  22  years  and  had  pure-tone  air-conduction 
thresholds  of  less  than  15  dB  HL  (ANSI,  1969)  between  5(X) 
and  4000  Hz. 

The  ages  and  hearing  thresholds  of  the  test  ears  of  the 
hearing-impaired  subjects  are  given  in  Table  1.  All  suffered 
from  bilateral  sensorineural  hearing  losses.  Hearing  thresh¬ 
olds  in  the  nontest  ear  were  no  more  than  10  dB  lower  than 
the  test  ear  thresholds  at  the  respective  test  frequencies. 
Bone-conduction  testing  indicated  air-bone  gaps  of  5  dB  or 
less  in  all  subjects.  All  hearing  losses  were  long-standing, 
and  there  were  no  indications  of  retrocochlear  involvement  in 
any  of  the  subjects.  The  hearing  losses  of  HI  and  H2  were 
apparently  congenital.  H4  reported  that  her  loss  was  associ¬ 
ated  with  a  high  fever  suffered  early  in  childhood.  H3  and  H5 
indicated  that  the  onset  of  their  hearing  losses  occurred  in 
adulthood. 

Stimuli 

Glide  and  step  signals.  The  glide  and  step  signals  were 
generated  and  stored  in  digital  form  on  a  Zenith  Z159 
microcomputer.  A  16-bit  digital-to-analog  converter  (Quikki) 
operating  at  a  20-kHz  sampling  rate  converted  the  stored 
signals  to  analog  waveforms.  The  resulting  signals  were 
low-pass  filtered  at  8000  Hz.  The  glide  signals  were  sinuso¬ 
idal  sweep  tones  with  center  frequencies  of  500, 1000,  2000, 
and  4000  Hz.  The  frequency  transition  was  200  Hz  over  50 
ms,  producing  a  4-Hz/ms  rate  of  frequency  change.  Rise/fall 
time  was  5  msec,  resulting  in  an  overall  signal  duration  of  60 
msec.  Signal  onsets  and  offsets  were  shaped  by  a  cosine- 
squared  function.  The  signal  duration  was  chosen  to  approx¬ 
imate  the  duration  of  formant  transitions  in  the  speech  signal. 
The  step  signals  traversed  the  same  frequency  range  as  the 
glide  signals,  but  did  so  in  discrete  steps.  The  number  of 
steps  varied  between  two  and  nine.  Schematic  representa¬ 
tions  of  the  glide  and  step  signals  are  shown  in  Figure  1 . 

Spectral  analysis.  Abrupt  frequency  jumps  such  as  those 
in  the  step  signal  generate  off-frequency  spectral  energy, 
which  is  a  potential  discrimination  cue.  To  minimize  this 
potential  confounding  variable,  the  step  signals  were  gener¬ 
ated  with  rounded  "comers. "  Spectral  analysis  indicated  that 
the  long-term  spectra  of  the  step  signals  were  essentially 
identical  to  that  of  the  glide  signal.  Figure  2  shows  a 
comparison  of  the  long-term  electrical  spectra  of  a  four-step 
signal  and  a  glide  signal.  The  acoustical  spectra  of  the 
signals  were  essentially  identical  to  their  electrical  spectra. 

Procedures 

Signal  levels.  The  hearing-impaired  subjects  were  tested 


TABLE  1.  Ffearlng-lmpafred  subject  Information.  Hearing 
thresholds  are  in  dB  HL. 


Subject 

Age 

500  Hz 

1000  Hz 

2000  Hz 

4000  Hz 

HI 

24 

30 

35 

40 

40 

H2 

25 

35 

35 

50 

55 

H3 

74 

45 

45 

45 

65 

H4 

27 

35 

50 

60 

60 

H5 

44 

50 

60 

60 

55 
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FIGURE  1.  Schematic  representation  of  the  step  and  glide 
signal  trajectories.  The  solid  line  represents  the  step  signal;  the 
broken  line  represents  the  glide  signal. 

in  quiet  at  a  sensation  level  (SL)  of  35  dB.  The  normal- 
hearing  subjects  were  tested  in  two  conditions.  In  the  first 
condition,  the  quiet  condition,  the  signals  were  presented  in 
quiet  at  35  dB  SL.  Feth  et  al.  (1989)  found  that  discrimination 
performance  in  normal-hearing  subjects  is  asymptotic  at 
intensities  as  low  as  30  dB  SL.  In  the  second  condition,  the 
masked  condition,  the  stimuli  were  presented  at  sound 
pressure  levels  (SPLs)  that  approximated  the  average  levels 
used  for  the  impaired  ears  and  were  as  follows:  500  Hz:  75 
dB,  1000  Hz:  80  dB:  2000  Hz:  82  dB;  and  4000  Hz:  83  dB. 
SPLs  were  determined  using  a  flat-plate  coupler.  Broadband 
masking  noise  was  low-pass  filtered  at  8000  Hz  and  com¬ 
bined  with  the  signal  to  achieve  a  signal  SL  of  approximately 
35  dB.  Thus,  in  the  two  conditions  the  normal-hearing 
subjects  were  compared  to  the  hearing-impaired  subjects  at 
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FIGURE  2.  Long-term  frequency  spectrum  of  the  glide  signal 
(solid  line)  snd  a  four-step  signal  (broken  line). 


both  an  equal  SL  (the  quiet  condition)  and  at  an  equal  SPL 
and  equal  SL  (the  masked  condition). 

Data  collection.  Subjects  were  tested  in  a  single-walled 
sound-attenuating  chamber.  Stimuli  were  presented  monau- 
rally  via  Sennheiser  H0414SL  headphones.  A  two-cue, 
two-altemative  forced-choice  procedure  (2-Q,  2AFC)  was 
used  to  determine  step/glide  discrimination  performance.  In 
each  trial  a  stimulus  was  presented  in  each  of  four  listening 
intervals.  The  interstimulus  interval  was  400  msec.  The 
subject  was  asked  to  pick  the  odd  stimulus,  which  was 
always  the  step  signal  and  was  always  presented,  randomly, 
in  interval  two  or  three.  Feedback  indicating  the  interval 
containing  the  step  signal  was  provided. 

The  stimuli  were  presented  in  blocks  of  50  trials,  and  a 
percent-correct  score  was  calculated  for  each  block.  Each 
subject  was  tested  over  at  least  three  sets  of  three  blocks  for 
each  different  step  signal  (nine  blocks  altogether,  or  450 
trials).  For  example,  at  least  nine  blocks  were  run  for  the 
1000-Hz  two-step  signal,  at  least  nine  blocks  for  the  1000-Hz 
three-step  signal,  etc.  If  the  subject  showed  no  improvement 
in  percent  correct  over  the  last  two  sets  of  blocks,  data 
collection  was  ended  for  that  signal.  The  percent-correct 
score  for  that  step  duration  (step  duration  being  a  function  of 
the  number  of  steps  in  the  signal)  was  obtained  by  calculat¬ 
ing  the  mean  of  the  percent-correct  scores  from  each  of  the 
last  six  blocks.  In  the  few  cases  where  improvement  contin¬ 
ued,  additional  sets  of  three  blocks  were  run  until  no  further 
improvement  was  found.  Percent  correct  was  calculated  as 
described  above  for  the  last  six  blocks.  Thus,  each  data  point 
in  the  individual  results  is  based  on  300  discrimination  trials. 
The  data  points  were  used  to  construct  psychometric  func¬ 
tions  for  each  of  the  test  frequencies  in  each  of  the  experi¬ 
mental  conditions. 

All  of  the  subjects  readily  learned  the  procedure  except  for 
the  oldest  subject.  H3,  who  had  difficulty  with  the  rate  of 
stimulus  presentation.  However,  when  stimulus  presentation 
was  slowed  to  one  half  Its  normal  rate,  the  subject  quickly 
mastered  the  task.  All  subjects  were  well  practiced  in  the  task 
when  data  collection  was  begun. 


Results 

The  psychometric  functions  in  Figure  3  display  the  mean 
discrimination  results  at  center  frequencies  from  500  to  4000 
Hz  for  all  conditions.  Percent-correct  discrimination  is  plotted 
as  a  function  of  step  duration  of  the  step  signal.  The  open 
symbols  indicate  data  obtained  from  the  normal-hearing  and 
the  hearing-impaired  subjects  in  the  quiet  condition.  The 
filled  symbols  represent  data  obtained  from  the  normal¬ 
hearing  subjects  in  the  masked  condition.  The  temporal 
resolution  threshold  (TRT)  was  defined  as  75%  correct 
discrimination.  The  mean  TRT  for  each  condition  at  the 
various  test  frequencies  is  given  in  Table  2.  The  threshold 
values  are  the  points  on  the  x-axis  at  which  the  psychometric 
functions  intercept  the  75%  correct  level,  estimated  to  the 
nearest  0.5  ms. 

Table  2  indicates  that  the  mean  TRTs  of  the  hearing- 
impaired  listeners  are  poorer  than  those  of  the  normal- 
hearing  listeners  at  all  frequencies.  No  comparison  is  possi- 
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Step  Duration  (ms) 

FIGURE  3.  Psychometric  functions  showing  the  mean  discrim¬ 
ination  results.  The  open  circles  represent  results  obtained 
from  the  normal  subjects  at  a  signal  level  of  35  dB  SL.  The  filled 
circles  represent  the  results  from  the  normal  subjecta  at  a 
signal  level  approximating  that  of  the  Impaired  subjects  with 
broadband  masking  noise  added  to  produce  a  sensation  level 
of  35  dB.  The  triangles  are  the  data  from  the  Impaired  aubjecta 
at  a  signal  level  of  35  dB  SU  The  horizontal  lines  mark  the  75% 
correct  discrimination  point. 


ble  at  4000  Hz  because  the  maximum  mean  discrimination 
scores  were  below  the  75%  criterion  for  both  groups.  It  is 
clear  that  the  differences  between  the  two  groups  are  sub¬ 
stantial.  even  in  the  masked  comparison.  It  is  also  evident 
from  Table  2  that  temporal  resolution  is  poorer  at  every 
frequency  in  the  masked  condition  than  it  is  in  the  quiet 
condition  for  the  normal-hearing  subjects. 

Several  analysis  of  variance  tests  were  performed  to 
support  the  conclusions  reached  through  visual  inspection  of 
the  data.  The  difference  in  TRTs  between  the  normal  and 
impaired  listeners,  with  group  as  a  between-subjects  factor, 
was  significant  when  the  normal-hearing  subjects  were  com¬ 
pared  to  the  impaired  subjects  in  both  the  quiet  [F(1,8)  = 
13.90,  p  <  .006J  and  the  masked  (F(1,8)  =  5.52,  p  <  .047] 
conditions.  Also,  a  comparison  of  the  results  from  the  normal- 


TABLE  2.  Mean  temporal  resolution  thresholds  for  each  of  the 
experimental  conditions. 


Condition 

500  Hz 

1000  Hz 

2000  Hz 

4000  Hz 

M 

SO 

M 

SO 

M 

SO 

M  SO 

Normal  quiet 

7.0 

12 

8.0 

2.3 

9.0 

3.5 

25.0- 

Normal  masked 

9.5 

1.5 

12.0 

1.0 

12.0 

3.0 

Impaired 

13.0 

3.3 

160 

3.8 

180 

3.7 

Note.  Values  were  obtained  by  visual  inspection  of  Figure  3  and  are 
eslinaled  to  the  nearest  0  5  msec.  A  mean  discrimination  score  of 
75%  correct  was  not  achieved  or  approximated  at  the  longest  step 
duration  in  the  normal  masked  and  impaired  conditions  at  4000  Hz. 
’Maximum  mean  discrimination  was  actually  73% 


hearing  subjects  in  the  masked  and  the  quiet  conditions,  with 
masking  treated  as  a  within-subjects  factor,  indicated  that  the 
effect  of  condition  was  significant  [F(1 ,4)  =  37.24,  p  <  .004). 

Table  2  also  indicates  that  there  is  a  dramatic  increase  in 
TRT  at  4000  Hz  in  all  three  conditions.  Even  at  the  longest 
step  duration  (25  msec),  only  3  of  the  normal  subjects  were 
able  to  achieve  the  75%  correct  discrimination  criterion  at 
4000  Hz  in  the  quiet  condition.  None  of  the  normal  subjects 
reached  the  criterion  value  at  4000  Hz  in  the  masked 
condition.  The  impaired  subjects  also  failed  to  achieve  75% 
correct  discrimination  at  4000  Hz.  in  the  case  of  the  normal- 
hearing  listeners,  TRTs  increased  very  slightly  between  500 
and  2000  Hz  in  the  quiet  and  masked  conditions.  However, 
there  is  a  considerably  greater  increase  in  threshold  between 
500  and  2000  Hz  for  the  impaired  subjects. 

One-way  analysis  of  variance  tests  with  frequency  as  a 
within-subjects  variable  were  performed  on  the  data  from 
500,  10(XD,  and  2000  Hz.  The  4000-Hz  data  were  not 
included  because  too  few  of  the  subjects  reached  the  dis¬ 
crimination  criterion  at  that  frequency.  The  results  indicated 
that  the  effect  of  frequency  was  not  significant  over  this 
frequency  range  in  the  case  of  the  normal-hearing  subjects  in 
quiet  or  in  noise.  However,  the  effect  of  frequency  over  the 
500-2000-Hz  range  was  significant  in  the  case  of  the  im¬ 
paired  subjects  [F(2,8)  =  4.95,  p  <  .04).  This  effect  can  be 
accounted  for  in  terms  of  hearing  sensitivity.  In  Figure  4,  TRT 
is  plotted  as  a  function  of  hearing  threshold  level  for  the 
hearing-impaired  subjects.  A  strong  positive  relation  between 
TRT  and  hearing  threshold  is  evident,  and  this  is  confirmed 
by  correlational  analysis  (r  =  0.68,  p  <  .01 ).  And,  in  general, 
the  hearing  thresholds  of  the  hearing-impaired  subjects 
increase  with  frequency.  Therefore,  it  appears  that  the  ap¬ 
parent  increase  in  TRT  with  frequency  is  in  fact  an  increase 
in  TRT  as  hearing  threshold  increases. 
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RGURE  4.  Temporal  resolution  of  the  Individual  hearing-im¬ 
paired  subjects  plotted  as  a  function  of  hearing  threshold.  The 
parameter  Is  stimulus  center  frequency. 
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In  summary,  the  effect  of  frequency  on  TRT  appears 
limited  to  4000  Hz.  There  is  no  statistically  significant  effect  of 
frequency  between  500  and  2000  Hz  in  the  case  of  the 
normal-hearing  subjects.  The  increase  in  TRT  between  500 
and  2000  Hz  observed  in  the  hearing-impaired  subjects 
appears  to  be  a  function  of  increasing  hearing  threshold  in 
the  higher  frequencies.  In  marked  contrast  to  the  results  in 
the  lower  frequencies,  discrimination  performance  in  all 
conditions  was  so  poor  at  4000  Hz  that  the  mean  discrimi¬ 
nation  scores  failed  to  reach  the  criterion  value  even  at  the 
25-msec  step  size. 

Overall,  the  individual  results  follow  the  trends  described 
above  for  the  averaged  data,  with  one  exception.  Normal 
subject  N5's  TRTs  are  poorer  than  those  of  any  of  the  other 
normal-hearing  subjects  at  nearly  all  test  frequencies  in  both 
the  quiet  and  masked  conditions.  For  example,  N5  s  TRTs  in 
the  quiet  condition  are  as  follows:  500  Hz:  9.5  msec;  1000 
Hz.  13.0  msec;  2000  Hz:  16.5  msec.  N5's  results  were 
consistent,  and  the  subject  performed  well  in  other  aspects  of 
the  experiment,  such  as  the  hearing  threshold  measure¬ 
ments,  indicating  that  lack  of  concentration  or  motivation  is 
not  a  likely  explanation  for  these  results.  In  Feth  et  al.'s 
(1989)  results  from  normal  subjects,  no  listener's  thresholds 
departed  this  far  from  the  mean.  The  data  from  N5  suggest 
that  some  individuals  with  normal  hearing  sensitivity  may 
have  abnormally  large  temporal  resolution  thresholds. 

Discussion  _ 

The  Effect  of  Frequency 

In  the  case  of  the  normal  listeners  in  quiet,  there  is  no 
significant  variation  in  mean  temporal  resolution  threshold 
between  500  and  2000  Hz,  but  at  4000  Hz  the  TRT  increases 
to  greater  than  25  msec.  This  pattern  is  not  seen  in  studies 
using  other  measures  of  temporal  acuity.  Formby  and  Forrest 
(1991),  for  example,  found  that  gap  detection  thresholds 
measured  with  sinusoidal  markers  are  independent  of  fre¬ 
quency  from  500  to  4000  Hz.  One  possible  explanation  for 
the  increase  in  TRT  at  4000  Hz  is  that  the  auditory  system 
tracks  the  frequency  changes  in  the  step  signal  using  infor¬ 
mation  from  phase-locked  neural  discharges.  If  this  were 
true,  then  temporal  resolution  would  be  expected  to  deterio¬ 
rate  as  phase-locking  declines.  It  is  well  known  that  in 
monkeys  and  cats,  neural  phase-locking  is  robust  below  1 
kHz.  declines  gradually  at  higher  frequencies,  and  is  absent 
above  4000  to  5000  Hz  (Rose,  Brugge  &  Hind,  1967).  The 
frequency  effect  observed  in  the  normal  subjects  fits  this 
pattern.  One  might  infer  that  the  mechanism  that  takes  over 
signal-tracking  (perhaps  rate-place  coding)  has  a  longer  time 
constant  than  the  phase-locking  mechanism. 

There  is  another  less  speculative  explanation,  however. 
The  step  signal  is,  in  effect,  a  sequence  of  level  tones 
separated  by  almost  instantaneous  frequency  transitions.  As 
the  step  duration  decreases,  the  extent  of  the  frequency 
transition  between  the  steps  decreases  as  well  It  can  be 
argued,  therefore,  that  the  subject's  frequency  discrimination 
ability  is  the  limiting  factor  in  the  task,  rather  than  the 
subject's  temporal  acuity.  Feth  et  al.  (1989)  investigated  the 


role  of  frequency  discrimination  in  the  step-glide  discrimina¬ 
tion  task  with  normal-hearing  subjects.  They  varied  the 
extent  of  frequency  transition,  using  signals  with  transitions 
of  200  and  400  Hz  while  holding  the  length  of  the  signal 
constant  at  50  msec.  If  frequency  discrimination  is  the  limiting 
factor  in  the  discrimination  task,  then  the  subjects'  perfor¬ 
mance  should  improve  for  the  400-Hz  transition  signals,  in 
which  the  between-step  jumps  are  twice  those  of  the  200-Hz 
transition  signals.  At  center  frequencies  of  500,  1000,  and 
2000  Hz,  there  was  no  significant  improvement  in  mean 
temporal  resolution  threshold  when  the  transition  size  was 
increased.  The  TRTs  of  the  200-Hz  signals  were  within  0.5 
ms  of  the  TRTs  of  the  400-Hz  signals.  These  data  support 
the  contention  that  frequency  discrimination  does  not  play  a 
limiting  role  for  signals  with  frequency  transitions  of  200  Hz  at 
center  frequencies  of  2000  Hz  and  below. 

However,  Feth  et  al.  found  that  at  a  center  frequency  of 
4000  Hz,  the  TRT  obtained  for  the  200-Hz  transition  signal 
was  about  7  msec  greater  than  the  TRT  for  the  400-Hz 
transition  signal.  These  data  strongly  suggest  that  frequency 
discrimination  has  a  considerable  effect  on  performance  at 
4000  Hz,  where  the  frequency  DL  is  relatively  large  (Moore, 
1973;  Wier,  Jesteadt,  &  Green,  1977).  Thus,  the  poor  step- 
glide  discrimination  obsen/ed  at  4000  Hz  in  the  present  study 
probably  reflects  the  effect  of  frequency  discrimination  rather 
than  temporal  resolution.  In  the  Feth  et  al.  study,  a  step 
signal  without  rounded  comers  was  used,  and  TRTs  were 
smaller,  particularly  at  4000  Hz.  Nevertheless,  the  TRTs 
obtained  at  the  lower  frequencies  are  very  similar  to  those  of 
the  present  study,  and  the  two  studies  are  highly  similar  with 
respect  to  the  overall  pattern  of  their  results. 


The  Effect  of  Frequency  Discrimination  in  the 
Hearing-Impaired  Subjects 

In  the  normal-hearing  listeners,  frequency  discrimination 
appears  to  affect  step-glide  discrimination  only  at  4000  Hz. 
However,  it  may  be  argued  that  the  poorer  temporal  resolu¬ 
tion  of  the  hearing-impaired  subjects  in  comparison  with  the 
normal-hearing  listeners  also  is  due  to  poorer  frequency 
discrimination.  To  investigate  this  possibility,  difference  li- 
mens  for  frequency  (DLFs)  were  obtained  for  3  of  the 
normal-hearing  and  3  of  the  hearing-impaired  subjects,  and 
the  correlation  between  DLFs  and  TRTs  for  these  subjects 
was  obtained.  A  strong  relationship  between  these  two 
variables  would  indicate  that  frequency  discrimination  is  a 
major  determining  factor  in  the  poorer  performance  of  the 
hearing-impaired  subjects. 

To  measure  DLFs,  50-msec  sinusoids  (5  msec  rise/fall 
time)  were  presented  at  35  dB  SL  in  the  same  2-Q,  2AFC 
task  that  was  used  for  the  step/glide  discrimination  task.  An 
adaptive  procedure  was  used  that  estimated  the  70.7% 
correct  point  on  the  psychometric  function  (Levitt,  1971). 
Table  3  displays  the  DLFs  of  the  subjects  tested.  Correla¬ 
tional  analysis  indicated  that  there  is  no  relation  between  the 
DLFs  and  TRTs  of  these  subjects  (r  =  .017).  It  therefore 
seems  unlikely  that  frequency  discrimination  played  a  major 
role  in  limiting  the  performance  of  the  hearing-impaired 
subjects. 
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TABLE  3.  DLFs  In  Hz  tor  normal-hoaring  and  haaring-impalred 
liatenera. 


Subject 

500  Hz 

1000  Hz 

2000  Hz 

Normal-hearing 

N2 

3.0 

5.1 

31.6 

N3 

7.5 

15.5 

25.6 

N4 

4.8 

7.1 

9.2 

Avg 

5.1 

9.2 

22.0 

Hearing-impaired 

H2 

6.6 

10.0 

7.2 

H4 

3.1 

6.6 

14.5 

H5 

5.4 

16.6 

29.3 

Avg 

5.0 

11.1 

17.0 

The  frequency  discrimination  results  are  similar  to  those 
obtained  in  other  studies.  Hall  and  Wood  (1984),  also  using 
50-msec  signals,  obtained  the  following  results:  at  500  Hz, 
normal-hearing  subjects.  1. 2-4.2  Hz;  impaired  subjects,  4.1- 
21 .0  Hz.  At  2000  Hz,  normal-hearing  subjects:  1 .9-9.6  Hz; 
impaired  subjects.  4.3-25.7  Hz.  The  somewhat  higher  values 
in  the  present  results  are  probably  due  to  differences  in 
presentation  level  (Hall  &  Wood  used  90  dB  SPL)  and 
practice  time.  Hall  and  Wood's  subjects  were  given  4  hours 
of  practice,  whereas  these  subjects  received  less  than  2 
hours.  The  intent  here  was  not  to  obtain  optimal  results,  but 
to  obtain  comparable  results.  The  fact  that  the  hearing- 
impaired  subject  results  are  very  similar  to  the  results  from 
the  normal-hearing  subjects  is  somewhat  surprising  but  not 
unprecedented.  Tyler,  Wood,  and  Fernandes  (1983),  com¬ 
menting  on  their  discrimination  results,  remark  that  many 
subjects  with  hearing  thresholds  greater  than  60  to  70  dB 
SPL  display  normal  frequency  discrimination. 

The  Effect  of  Stimulus  Level 

The  performarKe  of  the  normal-hearing  subjects  is  poorer  in 
the  masked  corxlition  than  in  the  quiet  condition.  This  result 
prompts  one  to  ask  whether  the  degradation  of  perfomiance  in 
the  masked  condition  is  due  to  the  presence  of  masking  noise 
or  to  the  higher  level  at  which  the  stimuli  are  presented  in  the 
masked  condition.  Preliminary  step-glide  discrimination  data, 
obtained  in  3  of  the  normal-hearing  subjects  by  presenting  the 
stimuli  at  the  higher  SPLs  without  the  addition  of  masking  noise, 
suggest  that  level  is  the  controlling  variable.  The  discrimination 
performance  of  all  3  subjects  was  found  to  decrease  as 
stimulus  level  was  increased.  This  decline  in  performance  with 
increasing  level  is  unusual  in  psychoacoustic  pherxxnena.  In 
general,  performance  increases  in  auditory  discrimination  tasks 
until  at  least  80  dB  SPL.  This  is  the  case,  for  example,  for  gap 
detection  (Plomp,  1964)  and  frequency  disenmination  (Wier  et 
al ,  1977). 

The  question  that  now  arises  is  whether  the  effect  of  absolute 
level  is  the  same  for  both  the  impaired  and  the  normal  subjects. 
That  is.  to  what  extent  can  the  poorer  temporal  resolution  of  the 
Impaired  subjects  be  accounted  for  by  their  higher  listening 
level?  Obviously,  a  difference  in  stimulus  level  cannot  account 
entirely  for  the  differerx»  between  the  two  groups.  In  the 


masked  condition,  the  normal  subjects  are  compared  to  the 
hearing-impaired  group  at  approximately  equal  SPLs,  and  the 
performance  of  the  impaired  subjects  is  poorer.  Two  of  the 
hearing-impaired  subjects,  H2  and  H4,  were  tested  at  a  range 
of  levels.  These  subjects’  discrimination  performance  begins  to 
decline  at  about  20  to  25  dB  SL  as  stimulus  level  is  decreased, 
and  at  about  35  dB  SL  as  stimulus  level  is  increased.  That  is. 
optimal  temporal  resolution  was  obtained  at  25  to  35  dB  SL  in 
the  hearing-impaired  subjects  tested.  (This  indicates  that  an 
optimal  listening  level  was  at  least  ai^roximated  for  the  im¬ 
paired  listeners  in  the  study.)  One  interpretation  of  these  data  is 
that  stimulus  audibility  imposes  the  lower  cut-off  point  for 
optimal  performance  artd  that  the  higher  cut-off  point  is  deter¬ 
mined  by  an  intensity-discrimination  function  that  is  similar  to 
that  of  the  normal  ear.  A  problem  with  this  explanation  is  the  fact 
that  signals  at  sensation  levels  less  than  25  dB  ought  to  be  quite 
audible  to  the  hearing-impaired  subjects  because  of  the  rapid 
growth  of  loudness  at  higher  levels  typically  found  in  individuals 
with  hearing  impairment  of  sensorineural  origin  (Sanders. 
1979).  Fitzgibbons  (1984)  argues  that  a  stimulus  of  20  dB  SL 
should  be  more  than  sufficiently  loud  to  elicit  optimal  gap 
detection  results  in  heanng-impaired  ears,  despite  the  decrease 
in  performance  below  30  dB  SL  in  nomial-hearing  ears.  In  any 
event,  it  appears  that  the  dynamic  range  of  the  hearing- 
impaired  subjects  is  severely  limited  for  the  experimental  task  in 
this  study.  Further  research  is  needed  to  establish  the  exact 
nature  of  this  limitation. 


Conclusions  _ 

The  major  findings  of  the  study,  as  observed  in  normal¬ 
hearing  subjects  and  subjects  with  mild-to-moderate  senso¬ 
rineural  hearing  losses,  are  as  follows; 

1.  When  normal-hearing  and  hearing-impaired  subjects 
were  compared  at  equal  sensation  levels,  mean  temporal 
resolution  thresholds  were  significantly  greater  in  the  hear¬ 
ing-impaired  subjects.  There  was  a  strong  positive  correla¬ 
tion  between  temporal  resolution  threshold  and  hearing 
threshold  for  the  hearing-impaired  listeners. 

2.  When  the  normal-hearing  and  hearing-impaired  sub¬ 
jects  were  compared  at  equal  sensation  levels  and  equal 
sound  pressure  levels,  the  mean  temporal  resolution  thresh¬ 
old  of  the  hearing-impaired  subjects  was  also  significantly 
greater  than  that  of  the  normal-hearing  subjects,  but  the 
difference  was  smaller  than  it  was  in  the  equal  sensation 
level  condition. 

3.  Temporal  resolution  thresholds  were  essentially  inde¬ 
pendent  of  frequency  at  500, 1000,  and  2000  Hz.  Step/glide 
discrimination  was  much  poorer  at  4000  Hz  than  at  the  other 
test  frequencies,  apparently  because  of  poorer  frequency 
discrimination  at  that  frequency. 
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Abstract 

A  means  of  determining  the  temporal  acuity  of  the  human  auditory 
system  using  frequency-modulated  (FM)  signals  is  proposed.  These  FM 
signals  have  some  characteristics  in  common  with  the  formant  transitions 
of  speech,  and  thus  may  be  useful  in  relating  psychoacoustic  perfor¬ 
mance  to  speech  processing.  Listeners  with  normal  hearing  were  asked 
to  discriminate  between  a  sinusoid  modulated  linearly  over  a  brief  time 
interval  (a  glide  ),  from  a  signal  covering  the  same  frequency  excursion 
in  a  multiple-STEP  trajectory.  When  the  glide  signal  and  the  step  signal 
were  just  discriminable  (75%  correct  in  2Q,2AFC)  we  assumed  that  the 
duration  of  a  single  step  was  less  than  the  width  of  the  auditory  temporal 
window.  For  presentation  frequencies  from  250  Hz  through  2000  Hz,  the 
estimate  of  the  width  of  the  temporal  window  was  7  to  10  msec.  Presen¬ 
tation  level  had  no  effect  on  results,  at  least  for  30  and  50  dB  SL.  Test¬ 
ing  above  50  dB  SL  was  limited  because  spectral  differences  between 
test  signals  might  confound  the  results.  Above  2  kHz,  performance  was 
poorer  in  the  discrimination  task.  In  fact ,  at  4  kHz  we  cannot  rule  out  the 
possibility  that  listeners  were  basing  their  decisions  on  just  discriminable 
frequency  jumps,  rather  than  on  temporal  differences.  A  follow  up  at  6 
kHz  using  different  listeners  led  to  equivocal  results. 
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Introduction 

Studies  of  auditory  temporal  resolution  measure  the  ability  of  the 
auditory  system  to  follow  rapid  changes  in  an  acoustic  signal.  Esti¬ 
mates  of  temporal  resolution  have  been  derived  from  several  different 
experimental  tasks  which  use  signals  that  change  rapidly  in  amplitude. 
Examples  include  fonward  masking,  temporal  modulation  transfer  func¬ 
tion  (TMTF),  and  gap  detection  studies.  The  results  of  these  studies 
have  been  relatively  consistent:  gap  thresholds  obtained  with  broad¬ 
band  noise  markers  are  on  the  order  of  3  to  5  msec  (e.g.,  Plomp,  1964; 
Florentine  and  Buus,  1984),  and  minimum  temporal  integration  times 
observed  in  TMTF  studies  are  in  the  2  to  8  msec  range  (Scott,  1986).  In 
both  gap  detection  and  TMTF  studies,  temporal  acuity  appears  to  be 
relatively  independent  of  level  over  a  wide  range,  declining  significantly 
only  at  low  levels,  evidently  as  audibility  decreases  (Buus  and  Floren¬ 
tine,  1985;  Viemeister,  1979). 

Gap  thresholds  for  narrowband  noise  markers  decrease  as  the 
center  frequency  of  the  stimulus  increases.  A  similar  effect  has  been 
reported  for  TMTF  studies  (Fitzgibbons,  1979;  Fitzgibbons  and  Wight- 
man,  1982;  Scott,  1986).  Initially,  it  was  hypothesized  that  this  effect 
was  due  to  the  shorter  response  time  of  the  more  broadly  tuned  audi¬ 
tory  filters  in  the  higher  frequencies.  However,  there  is  now  consider¬ 
able  evidence  that  this  effect  is  due  to  fluctuations  in  level  that  are  in¬ 
herent  in  narrowband  noise  (Moore  and  Glasberg,  1977;  Shailer  and 
Moore,  1987;  Moore,  1989;  Eddins,  et  al.,  1992).  In  the  earlier  gap  de¬ 
tection  experiments,  octave-band  noise  was  used,  and  thus  the  band¬ 
width  of  the  stimuli  decreased  with  decreasing  center  frequency.  The 
slower  fluctuations  in  level  at  the  narrower  bandwidths  were  more  eas¬ 
ily  confused  with  the  gaps,  causing  greater  gap  thresholds  at  the  lower 
frequencies.  In  fact,  when  deterministic  signals,  which  contain  no  level 
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fluctuations,  are  used  as  markers,  gap  thresholds  are  essentially  inde¬ 
pendent  of  frequency  (Shailer  and  Moore,  1987;  Formby  and  Forrest, 
1991). 

Several  investigators  have  proposed  a  model  to  account  for  gap 
detection  and  TMTF  results.  The  model  requires  a  bank  of  auditory  fil¬ 
ters.  The  filter  bank  is  followed  by  a  non-linear  device,  a  temporal  win¬ 
dow,  and  a  level  detector  acting  on  the  output  of  each  channel  (Shailer 
and  Moore,  1987;  Green  and  Forest.  1988).  Because  the  temporal 
window  seems  to  play  the  major  role  in  determining  temporal  resolu¬ 
tion  characteristics  of  the  auditory  system,  there  have  been  a  series  of 
attempts  to  provide  a  simple  estimate  of  its  width.  Drawing  on  the 
analogy  with  the  critical  band,  these  studies  initially  attempted  to  de¬ 
termine  the  width  of  a  critical  masking  interval  (Penner,  Robinson,  and 
Green,  1972;  Penner  and  Cudahy,  1973;  Robinson  and  Pollack,  1973, 
Penner  et  al.,  1974).  The  results  of  these  studies  were  inconsistent,  at 
least  in  part  because  of  a  failure  to  control  for  the  effects  of  “off-time" 
listening  (Moore,  Glasberg,  Plack  and  Biswas,  1988).  Moore  and  his 
colleagues  (Moore  et  al.,  1988;  Plack  and  Moore.  1990)  appear  to  have 
avoided  this  problem  in  experiments  that  measure  the  threshold  of  a 
brief  sinusoid  presented  in  a  temporal  gap  between  two  noise  bursts. 
Their  results  led  them  to  describe  the  temporal  window  as  an  asym¬ 
metric  temporal  intensity-weighting  function  that  could  be  modeled  well 
by  fitting  rounded  exponentials  to  each  side  of  the  function.  Plack  and 
Moore's  (1990)  estimate  of  the  equivalent  rectangular  duration  (ERD) 
of  this  "roex"  temporal  window  decreased  from  1 3  to  9  msec  as  center 
frequency  increased  from  300  to  900  Hz  and  then  remained  relatively 
constant,  declining  slightly  to  7  msec  at  8100  Hz.  They  also  noted  an 
effect  of  level.  Above  900  Hz,  level  increases  produced  somewhat  nar¬ 
rower  ERD  estimates.  Moore  et  al.  (1988)  demonstrate  that  the  calcu¬ 
lated  output  from  a  window  of  this  general  form  gives  reasonably  accu¬ 
rate  predictions  for  at  least  some  aspects  of  various  temporal  phenom¬ 
ena  involving  amplitude  modulated  signals;  for  example,  the  detection 
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of  amplitude  changes  (Buunen  and  Valkenburg,  1979)  and  temporal 
modulation  transfer  functions.  It  should  be  noted,  however,  that  Shailer 
and  Moore  (1987)  suggested  that  a  temporal  integrator  with  a  15  msec 
time  constant  could  account  for  the  absence  of  marked  increases  in 
gap  thresholds  at  low  frequencies  that  might  be  expected  from  ringing 
of  the  auditory  filters.  Apparently,  there  is  still  some  uncertainty  as  to 
the  exact  nature  of  the  temporal  window. 

In  contrast  to  the  multiplicity  of  studies  with  steady-state  signals, 
the  literature  contains  relatively  few  studies  concerning  the  temporal 
resolution  of  signals  that  change  rapidly  in  frequency  or  phase,  despite 
the  ubiquity  of  such  signals  in  natural  sounds  such  as  speech.  Patter¬ 
son  and  Green  (1970)  measured  temporal  acuity  using  Huffman  (1962) 
sequences,  brief  waveforms  in  which  energy  is  delayed  in  some  fre¬ 
quency  region.  Green  (1973)  found  that  subjects  could  detect  a  delay 
time  of  about  2  msec  at  650, 1900,  and  4000  Hz.  Modulation-rate 
transfer  functions  for  frequency-modulated  (FM)  sinusoids  can  be  inter¬ 
preted  as  measuring  the  ability  of  the  auditory  system  to  follow  periodic 
spectral  changes.  For  a  carrier  frequency  of  1  kHz,  the  detectability  of 
FM  sinusoids  monotonically  decreases  at  modulation  rates  greater 
than  about  2  Hz,  as  is  indicated  by  increased  modulation  depth  needed 
for  detection  (Kay,  1982).  Detection  then  improves  at  rates  higher  than 
1 00  Hz,  apparently  due  to  the  resolution  of  spectral  sidebands.  At 
modulation  rates  of  less  than  5  to  10  Hz,  the  frequency  changes  can  be 
followed  perceptually.  Between  10  Hz  and  100  Hz  the  sound  is  de¬ 
scribed  as  "rough"  or  "motorboating"  (Kay,  1982).  These  findings  sug¬ 
gest  that  there  is  some  degree  of  temporal  resolution  of  sinusoidal  FM 
up  to  at  least  100  Hz. 

The  purpose  of  this  study  was  to  investigate  further  the  temporal 
resolution  of  FM  signals  using  a  new  experimental  task.  Subjects  were 
asked  to  discriminate  between  two  frequency-modulated,  sinusoidal 
signals.  One  signal,  the  glide,  made  a  transition  from  a  lower  fre¬ 
quency  to  a  higher  frequency  over  a  smooth,  linear  path.  The  target 
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signal,  called  the  step,  was  identical  to  the  glide,  except  that  its  trajec¬ 
tory  followed  a  series  of  brief  steps.  That  is,  the  step  signal  remained 
at  one  frequency  for  a  brief  time  before  jumping  to  the  next  frequency. 
Pilot  studies  indicated,  not  surprisingly,  that  as  the  number  of  steps  was 
increased,  discrimination  became  more  difficult,  and  listener  perfor¬ 
mance  monotonicaliy  decreased  to  chance.  It  is  assumed  that  discrim¬ 
ination  threshold  is  reached  when  the  listener’s  percentage  of  correct 
discriminations  falls  below  75%.  We  infer  that  the  temporal  window 
must  be  wider  than  the  size  of  an  individual  step  in  the  step  signal  at 
this  point.  That  is,  changes  in  the  signal  which  occur  in  less  time  than 
the  width  of  the  temporal  window  are  not  distinguishable  by  the  lis¬ 
tener. 

This  task  may  be  a  more  direct  measure  of  the  purely  temporal 
properties  of  the  auditory  system  than  studies  that  involve  the  detection 
of  amplitude  changes.  The  current  model  of  temporal  resolution  im¬ 
plies  that  gap  detection  performance,  for  example,  depends  on  the 
sensitivity  of  the  level  detector,  as  well  as  the  properties  of  the  temporal 
window  (Moore  et  al.,  1989).  Level  detector  sensitivity  may  play  a  ma¬ 
jor  role  in  the  case  of  hearing-impaired  subjects.  As  long  as  the  fre¬ 
quency  jumps  between  steps  are  large  enough  to  be  easily  discrim- 
inable,  no  such  confounding  factors  are  present  in  this  paradigm. 

The  study  was  designed  to  investigate  the  effects  of  several  vari¬ 
ables  on  the  temporal  resolution  of  FM  signals,  including  signal  dura¬ 
tion,  signal  transition  rate,  transition  size,  center  frequency,  and  presen¬ 
tation  level. 
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Methods 


■Stiniuli 

The  signals  used  in  the  discrimination  task  were  frequency-modu¬ 
lated  sinusoids.  The  frequency  of  the  glide  changed  linearly  over  a 
brief  time  interval,  T.  The  extent  of  the  frequency  excursion  is  labeled 
AF.  Thus,  GLIDE  signals  traversed  AF  Hertz  in  T  msec.  The  step  signal  at 
each  center  frequency,  fc  .  covered  the  same  frequency  excursion  over 
the  same  duration  as  a  glide,  but  its  frequency  followed  a  multiple-step 
trajectory.  The  simplest  of  the  step  signals  would  remain  at  the  initial 
frequency  for  T/2  msec,  then  jump  to  the  final  frequency  for  the  remain¬ 
der  of  the  signal.  Other  signals  would  cover  AF  in  three,  four  or  more 
equal-duration  steps.  Trajectories  and  long-term  spectra  for  a  glide  and 
a  4-STEP  signal  are  shown  in  Madden  and  Feth  (1992). 

Figure  1  presents  a  display  of  the  response  of  the  auditory  filter 
bank  to  various  STEP  and  glide  signals.  Each  signal  covers  400  Hz  in 
100  msec  at  a  center  frequency  of  1  kHz.  The  filter  bank  response  is 
taken  from  the  first  stage  of  the  Auditory  Sensation  Processing  (ASP) 
model  of  Patterson  et  al.  (1992) 

Figure  1  about  here 

GLIDE  and  STEP  signals  were  generated  off-line  by  a  laboratory  mi¬ 
crocomputer  and  stored  on  hard  disk  for  use  in  each  discrimination  run. 
They  were  converted  to  analog  form  at  a  20  kHz  sampling  rate  using  a 
1 6-bit  D-to-A  converter  (TTES  Quikki  board).  The  post  D-to-A  filter  was 
set  to  a  low  pass  cutoff  frequency  of  8.5  kHz.  Signals  were  generated 
with  5-msec  rise  and  fall  times  which  were  shaped  by  raised  cosine  func¬ 
tions  in  the  generation  program.  Frequency  transitions  did  not  extend 
into  the  rise  and  fall  portions  of  either  signal. 
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Signals  were  generated  at  center  frequencies  of  .25.  .5, 1  .  2,  4 
and  6  kHz.  AFs  covered  100, 200  and  400  Hz  at  each  center  frequency. 
Values  for  T  were  100.  50  and  25  msec,  plus  5  msec  of  rise  and  fall  time. 


Subificts 

Subjects  for  this  experiment  were  eight  university  students  who 
were  recruited  to  serve  in  the  study.  All  had  hearing  within  normal  limits 
and  negative  otological  histories.  They  ranged  in  age  from  1 8  to  26 
years.  All  were  female.  Listeners  were  paid  an  hourly  wage  for  their 
participation. 

Procedures 

Testing  was  conducted  in  a  four-interval,  two-alternative  proce¬ 
dure,  commonly  called  2Q,2AFC.  glide  signals  were  always  presented 
in  the  first  and  fourth  intervals.  The  target  signal,  step,  was  presented 
in  either  the  second  or  the  third  interval,  with  equal  probability,  and  a 
GLIDE  signal  was  presented  in  the  remaining  interval.  The  listeners  were 
instructed  to  indicate  whether  interval  two  or  three  contained  the  “odd” 
signal.  They  were  given  feedback  to  indicate  the  correct  response  after 
each  trial.  Three  blocks  of  fifty  trials  each,  for  a  given  pair  of  glide  and 
STEP  signals,  were  run  in  succession,  and  each  listener’s  percent  correct 
score  for  each  block  was  recorded.  New  signals  were  selected  for  the 
next  three-block  set.  Listeners  were  given  brief  rests  after  each  three 
block  run,  and  longer  breaks  about  every  half  hour.  Testing  was  usually 
conducted  for  three  listeners  at  one  time  for  a  period  of  two  hours  per 
day.  Results  were  based  on  at  least  six  fifty-block  trials  with  no  more  than 
150  trials  for  a  given  signal  pair  collected  in  one  day. 

Detection  thresholds  were  determined  for  each  of  the  glide  sig¬ 
nals  used  as  standards  in  the  discrimination  testing.  Listeners  were 
tested  in  a  simple,  adaptive  2AFC  detection  task.  Once  thresholds  were 
determined  at  each  fc  ,  discrimination  testing  was  conducted  with  signals 
presented  at  50  dB  SL  for  each  individual  listener. 
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Results 

Results  of  the  discrimination  testing  were  initially  plotted  as  psy¬ 
chometric  functions;  that  is,  percent  correct  discrimination  was  plotted  as 
a  function  of  the  number  of  steps  in  the  step  signal.  For  economy  of 
space  and  to  highlight  the  importance  of  step  duration,  these  original 
psychometric  functions  were  re-plotted.  The  duration  of  a  single  step 
(rather  than  number  of  steps  in  the  whole  signal)  was  chosen  as  the  ab¬ 
scissa.  Plotted  this  way,  discrimination  of  glide  versus  STEP  signals 
could  be  compared  for  different  signal  durations.  To  facilitate  such  com¬ 
parisons,  three  sets  of  psychometric  functions  were  produced  for  each 
center  frequency  tested.  Each  set  contained  psychometric  functions  for 
combinations  of  T  and  AF  which  result  in  the  same  transition  rate.  Each 
set  represented  discrimination  performance  in  percent  correct  as  a  func¬ 
tion  of  individual  step  size  for  transition  rates  of  2,  4  or  8  Hz/msec. 

Temporal  resolution  at  1  kHz 

Figure  2  shows  the  averaged  results  for  four  listeners,  for  step  vs 
GLIDE  discrimination  at  fc  =  1  kHz.  Within  each  panel,  either  two  or  three 
psychometric  functions  are  shown.  In  the  top  panel,  performance  for  AF 
transitions  of  200  Hz  over  100  msec  and  100  Hz  over  50  msec  are  dis¬ 
played.  In  the  center  panel,  functions  for  transitions  of  400  Hz  over  1 00 
msec,  200  Hz  over  50  msec  and  1 00  Hz  over  25  msec  are  plotted.  The 
bottom  panel  shows  400  Hz  over  50  msec  and  200  Hz  over  25  msec. 
Thus,  transition  rates  from  the  top  to  the  bottom  panel  of  the  figure  are  2, 
4  and  8  Hz/msec.  Symbol  type  (open  circles  =  100  msec,  filled  circles  = 
50  msec  and  triangles  =  25  msec)  always  indicates  the  duration  of  the 
signal  pair.  Solid  lines  denote  AF  =  400  Hz,  medium  dashed  lines  indi¬ 
cate  AF  =  200  Hz  and  dotted  lines  represent  AF  =100  Hz. 

Figure  2  about  here 
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Visual  inspection  of  the  psychometric  functions  in  Figure  2  shows 
a  STEP  vs  GLIDE  discrimination  threshold  of  about  8  msec  at  2  Hz/msec 
determined  from  the  intercept  at  75%.  For  4  Hz/msec  the  value  is  about  7 
msec,  and  at  the  highest  rate,  8  Hz/msec,  it  is  5  msec.  Note  that  the  psy¬ 
chometric  functions  for  25-  and  50-msec  transitions  are  nearly  congruent. 
Those  for  100-msec  transition  rates  appear  to  be  shifted  to  the  left,  indi¬ 
cating  somewhat  better  discrimination  for  the  longer  sweeps. 

Temporal  resolution  for  fr  from  250  Hz  to  6  kHz 

Figure  3  displays  STEP  vs  glide  discrimination  thresholds  for 
center  frequencies  ranging  from  250  Hz  to  6  kHz.  Discrimination  thresh¬ 
old  is  taken  as  the  step  duration  for  P(C)  =  75%.  The  parameter  is  transi¬ 
tion  rate.  Thus  results  have  been  collapsed  over  signal  duration.  The 
same  four  listeners  participated  in  the  experiment  through  4  kHz.  Four 
new  listeners  replaced  the  original  listeners  for  the  6  kHz  condition, 
which  was  tested  several  months  after  the  original  data  were  collected. 

Figure  3  about  here 

For  fcs  of  250,  500  and  2000  Hz,  the  results  are  similar  to  those 
obtained  at  1  kHz.  At  2  kHz,  the  psychometric  functions  are  shifted 
slightly  leftward,  indicating  greater  sensitivity,  as  transition  rate  increased 
from  2  to  8  Hz/msec.  This  improvement  is  not  evident  in  results  at  500 
and  250  Hz.  The  congruence  of  psychometric  functions  for  50  and  25 
msec  transitions  and  the  small  shift  to  the  left  for  100  msec  functions, 
were  not  as  marked  in  these  results  as  they  were  at  1  kHz. 

Performance  in  the  step  vs  glide  discrimination  task  is  much 
poorer  for  center  frequencies  of  4  and  6  kHz.  For  4  kHz,  discrimination 
threshold  exceeds  20  msec  at  the  2  Hz/msec  rate,  improving  to  7  or  8 
msec  at  8  Hz/msec.  Similar  results  are  apparent  for  6  kHz,  although  it 
should  be  noted  that  new  listeners  replaced  the  original  ones.  There  is 
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a  tendency  for  performance  with  shorter  duration  transitions  to  be  shifted 
toward  greater  sensitivity  at  6  kHz,  at  least  for  the  two  slower  rates  (i.e.,  2 
and  4  Hz/msec). 

Effect  of  presentation  level 

To  determine  the  effect  of  level  on  step  vs  glide  discrimination,  we 
repeated  the  testing  at  1  kHz  over  a  range  of  levels.  At  30  and  50  dB  SL, 
we  determined  step  vs  glide  discriminability  for  all  three  sweep  rates 
used  in  the  previous  testing.  Four  new  listeners  were  employed  for  this 
portion  of  the  study,  but  procedures  were  essentially  the  same  as  de¬ 
scribed  above.  There  was  one  important  difference  in  the  signals  used  in 
this  part  of  the  study,  however.  A  17-step  signal  was  used  as  the  stan¬ 
dard  (GLIDE)  in  these  tests.  This  was  because  of  the  presence  of  spectral 
differences  between  the  glide  and  step  signals  that  are  evident  about  50 
dB  down  from  the  peak  of  the  center  lobe  (for  an  example,  see  Figure  2 
in  Madden  and  Feth  ,1992).  The  STEP  signal  contains  some  energy 
splatter  not  present  In  the  glide  signal  due  to  the  abrupt  frequency  transi¬ 
tions.  Discrimination  at  70  dB  SL  could  have  been  confounded  by  this 
energy  splatter  artifact.  Using  a  17-step  transition  as  a  standard  should 
minimize  the  influence  of  this  confounding  factor.  We  chose  to  substitute 
for  the  original  GLIDE  signal  rather  than  introducing  a  low-level  broad 
band  masker  to  cover  possible  splatter. 

Results  are  displayed  in  Figure  4.  The  height  of  each  bar  indi¬ 
cates  the  temporal  threshold  determined  at  each  presentation  level.  Bars 
are  coded  to  indicate  the  transition  rates  of  2,  4  and  8  Hz/msec.  Only  one 
sweep  rate  was  tested  at  70  dB  SL  The  results  indicate  a  slight  de¬ 
crease  in  temporal  threshold  as  level  increases,  but  the  differences  are 
very  small,  on  the  order  of  one  or  two  milliseconds.  The  absence  of  a 
substantial  improvement  in  performance  at  70  dB  SL  supports  the  con¬ 
tention  that  these  results  are  not  contaminated  by  the  low-level,  long-term 
spectral  cues. 
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Note  also  that  the  results  at  50  dB  SL  are  essentially  identical  to 
those  shown  at  1  kHz  in  Figure  2.  It  might  be  suggested  that  the  step 
signals  were  distinguishable  from  their  respective  glide  signals  in  the  first 
part  of  the  experiment  because  of  spectral  splatter.  If  this  were  taie,  we 
would  expect  Figure  3,  which  displays  results  obtained  with  the  linear 
GLIDE  to  reflect  better  performance  than  Figure  4,  which  displays  results 
with  the  1 7-step  "glide".  This  obviously  Is  not  the  case. 


Figure  4  about  here 


Discussion 


Summary  of  findings 

The  STEP  vs  GLIDE  discrimination  performance  of  our  listeners  is 
very  consistent  for  fc  values  from  250  through  2000  Hz.  On  the  average, 
a  multi-step  transition  is  distinguishable  from  its  linear-sweep  counterpart 
when  the  individual  steps  are  5  to  10  msec  in  duration.  We  would  like  to 
suggest  that  the  duration  of  a  “just  discriminable”  single  step  is  a  good 
indicator  of  the  wir"'  of  the  temporal  window  of  the  normal  auditory  sys¬ 
tem.  However,  we  I  st  rule  out  some  obvious  alternative  explanations. 

First,  we  must  consider  whether  the  listeners  are  using  long-term 
spectral  differences  to  distinguish  between  the  glide  and  the  step  sig¬ 
nals.  Given  the  abrupt  change  in  frequency  at  each  step,  there  must  be 
some  splatter  of  energy  in  the  step  signal  that  is  not  present  in  the  glide 
signal.  Spectra  for  equivalent  glide  and  step  signals  reveal  only  very 
small,  non-systematic  differences  in  the  main  lobe.  Any  significant  differ¬ 
ences  in  the  “tails"  of  the  spectra,  where  we  might  expect  off-frequency 
listening  to  occur,  are  more  than  50  dB  down  from  the  level  of  the  main 
lobe.  Given  that  most  of  our  testing  was  conducted  at  50  dB  SL,  and  the 
results  using  1 7-step  standard  signals  in  place  of  the  linear  glides,  we 


12 


Temporal  Acuity  with  FM  Glides 


find  the  off-frequency  listening  explanation  difficult  to  accept. 

Another  explanation  of  our  listeners’  ability  to  distinguish  glide 
from  STEP  signals  might  suggest  that  since  each  transition  results  in  a 
frequency  change,  listeners  may  be  performing  a  simple  frequency  dis¬ 
crimination  task.  Two  observations  lead  us  to  reject  this  explanation  at 
frequencies  below  2  kHz.  Rrst,  at  the  just-discriminable  step  size,  the 
frequency  differences  are  considerably  larger  than  the  normal  DLF.  For 
example,  at  1  kHz,  the  duration  of  the  just  discriminable  step  ranges  from 
1 0  msec  at  the  2  Hz/msec  rate,  to  5  msec  at  8  Hz/msec.  Concomitant 
frequency  changes  range  from  20  Hz  to  40  Hz  at  each  step.  Even  con¬ 
sidering  the  effect  of  shorter  durations,  these  values  are  larger  than  ex¬ 
pected  from  simple  frequency  discrimination  measures  (Moore,  1973).  It 
is  also  difficult  to  understand  why  the  just  discriminable  frequency 
change  should  grow  from  20-  to  40  Hz  as  the  rate  of  transition  increases 
from  2-  to  8  Hz/msec.  Further  evidence  against  a  simple  frequency  DL 
explanation  lies  in  the  almost  constant  performance  from  250  through 
2000  Hz.  Discrimination  dependent  on  frequency  differences  should 
vary  with  fc  as  the  DLF  does. 

However,  at  4  kHz  and  above,  listener  performance  may  be  limited 
by  frequency  discrimination.  Unlike  the  lower  frequencies,  at  4  kHz  the 
just  discriminable  step  duration  varies  with  transition  rate.  As  transition 
rate  increases  from  2  Hz/msec  to  8  Hz/msec,  the  just  distinguishable  du¬ 
ration  decreases  from  above  20  msec  to  about  7  msec.  If  we  calculate 
the  size  of  frequency  transition  at  each  75%  point  on  the  various  psy¬ 
chometric  functions,  they  lie  in  the  50  Hz  range.  Thus,  it  appears  that  a 
constant  frequency  jump  may  account  for  performance  at  4  kHz,  rather 
than  a  constant  step  duration. 

We  might  then  expect  that  the  results  for  6  kHz  should  show  simi¬ 
lar  behavior,  with  a  just  discriminable  frequency  step  somewhat  larger 
than  that  at  4  kHz.  If  we  assume  that  AF  /  F  is  constant,  then  at  4  kHz,  50 
/  4000  =  0.0125.  At  6  kHz,  the  DLF  should  be  0.0125  X  6000,  or  75  Hz. 
While  our  listeners  approach  that  value  for  8  Hz/msec,  most  of  the  fre- 
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quency  jump  values  at  75%  performance  at  the  slower  transition  rates 
are  smaller  than  this  predicted  value.  Comparison  is  hindered  by  the  fact 
that  a  different  set  of  subjects  were  tested  at  6  kHz.  Nevertheless,  their 
performance  does  not  show  a  constant  just-discriminable  frequency  jump 
across  transition  rates  at  6  kHz. 

Ralationstilp  with  critical  bandwidth 

The  differential  effect  of  bandwidth  on  many  psychoacoustic  phe¬ 
nomena  is  well  known  (e.g.,  Scharf,  1970),  and  the  concept  of  the  critical 
band,  or  auditory  filter,  has  been  shown  to  be  important  in  the  explana¬ 
tion  of  temporal  resolution  phenomena.  For  example,  using  pairs  of  si¬ 
nusoidal  markers  of  the  same  or  different  frequency,  Formby  and  Forrest 
(1991)  found  that  gap  thresholds  increase  as  the  frequency  separation 
between  markers  is  increased.  They  then  fit  their  data  using  a  model  of 
the  auditory  filter.  They  assumed  that  in  the  gap  detection  task  the  sub¬ 
ject  monitors  the  output  level  of  a  single  auditory  filter  centered  on  the 
first  marker  of  the  marker  pair.  Using  a  roex  model  of  the  weighting  func¬ 
tion  for  the  auditory  filter,  they  then  calculated  the  amount  of  attenuation 
of  the  second  marker  at  various  frequency  separations.  They  were  able 
to  accurately  predict  the  increase  in  gap  detection  threshold,  due  to  the 
attenuation  of  the  second  marker,  as  the  frequency  separation  between 
the  markers  increased. 

We  wished  to  see  if  the  results  from  the  present  study  can  be  rec¬ 
onciled  with  a  model  that  involves  monitoring  the  output  level  of  the  audi¬ 
tory  filters.  Greenwood  (1991)  has  recently  summarized  a  large  body  of 
bandwidth  estimates.  At  250  Hz,  a  good  estimate  of  the  equivalent  rect¬ 
angular  bandwidth,  ERB,  is  50  Hz.  All  of  the  sweep  widths  used  in  the 
present  study  (100, 200  and  400  Hz)  exceed  this  ERB.  As  fc  is  in¬ 
creased,  the  smaller  AF  values  approximate  the  ERB.  At  1  kHz,  for  ex¬ 
ample,  the  ERB  is  about  150  Hz.  Thus,  the  100  Hz  sweep  falls  within  one 
ERB,  the  200  Hz  sweep  just  exceed  one  ERB,  and  the  400  Hz  sweep  tra¬ 
verses  several  bandwidths.  Above  2  kHz,  all  sweeps  are  contained 
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within  one  ERB. 

A  4-step  signal  of  fc  =  250  Hz  with  a  200  Hz  transition  would 
excite  those  filters  with  center  frequencies  at  each  of  its  "steady-state" 
frequencies  (approximately  150,  217,  284  and  350  Hz).  Filters  with  cen¬ 
ter  frequencies  between  these  \  ^quencies  would  be  excited  to  a  lesser 
extend.  The  glide  signal  would,  however,  excite  all  filters  between  150 
and  350  Hz.  Thus,  a  mechanism  monitoring  the  output  level  of  the  indi¬ 
vidual  filters  would  see  two  distinct  excitation  patterns  over  time:  the 
GLIDE  would  excite  all  fitters  over  its  range  equally,  whereas  the  step 
would  excite  primarily  those  filters  at  its  individual  step  frequencies,  and 
not  those  skipped  over  by  the  frequency  jumps.  At  2000  Hz,  the  picture 
would  obviously  be  quite  different.  For  a  200  Hz  transition,  because  of 
the  increased  filter  width,  no  filters  would  be  "skipped"  in  the  jumps  be¬ 
tween  steps;  all  auditory  filters  within  the  transition  range  would  be  ex¬ 
cited  to  the  same  extent.  The  monitoring  mechanism  would  see  glide 
and  STEP  signal  filter  output  patterns  that  are  much  less  distinct  from  one 
another  than  those  produced  at  lower  fcs.  Such  a  model  would  thus  lead 
us  to  predict  systematically  poorer  discrimination  performance  as  fc  in¬ 
creases.  However,  for  our  results,  this  is  clearly  not  the  case.  Thus,  our 
results  suggest  that  a  detection  system  using  the  output  levels  of  the 
auditory  filters,  such  as  has  been  shown  to  account  for  the  detection  of 
amplitude  changes  in  spectrally  static  signals,  may  not  be  useful  in  ex¬ 
plaining  the  temporal  resolution  of  frequency-modulated  signals. 

Relationship  of  results  to  other  temporal  acuity  measures 

Estimates  of  auditory  temporal  acuity  range  from  less  than  1  msec 
to  more  than  20  msec,  depending  on  the  task  used  to  determine  the  tem¬ 
poral  threshold  (see  for  example.  Green,  1971, 1973, 1985).  Our  results 
appear  to  most  closely  resemble  those  from  Plack  and  Moore's  (1990) 
careful  determination  of  the  shape  of  the  temporal  window.  Both  studies 
produced  indications  of  temporal  resolution  in  the  7  to  10  msec  range 
over  much  of  the  auditory  spectaim.  In  the  Plack  and  Moore  study,  win- 
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dow  shapes  were  determined  using  a  tone  pulse  located  in  a  temporal 
gap  between  two  maskers.  That  is,  they  determined  the  limits  of  temporal 
resolution  that  are  imposed  by  forward  and  backward  masking  in  a 
paradigm  that  is  the  temporal  analog  of  Patterson's  auditory  filter  shape 
determination  (1976).  The  similarities  between  our  results  and  theirs 
suggests  that  temporal  masking  may  be  the  main  limiting  factor  in  the 
detection  of  frequency  changes  in  the  step  signal.  As  Plack  and  Moore 
(1990)  suggest,  it  may  be  that  some  mechanism  smoothes  or  integrates 
neural  activity,  or  neural  information  as  they  put  it,  over  a  certain  time  pe¬ 
riod.  However,  there  are  several  Interesting  differences  between  the 
Plack  and  Moore  results  and  those  of  the  present  study. 

One  difference  is  seen  in  the  effect  of  presentation  level.  The 
Plack  and  Moore  results  show  improvement  in  temporal  resolution,  i.e.,  a 
smaller  ERO  (equivalent  rectangular  duration),  with  increased  level  at  all 
test  frequencies  except  300  Hz.  Our  comparison  of  performance  at  30  dB 
SL  with  that  at  50  dB  SL  shows  little  difference  in  performance.  Even  the 
limited  testing  at  70  dB  SL  shows  little  change. 

Both  studies  demonstrated  an  effect  of  signal  frequency,  but  at  op¬ 
posite  ends  of  the  spectrum.  Plack  and  Moore  report  poorer  performance 
at  their  lowest  test  frequency  (300  Hz)  but  we  have  found  degraded  per¬ 
formance  at  frequencies  beyond  2  kHz.  Looking  first  at  Plack  and 
Moore's  300  Hz  results,  we  note  that  estimates  of  auditory  filter  width  at 
low  frequencies  have  been  confounded  by  difficulties  in  specifying  the 
level  of  the  masking  noise  (Fasti  and  Schorer,  1 986).  An  under-specifi¬ 
cation  of  masker  level  in  the  low  frequencies  could  lead  to  inflated  esti¬ 
mates  of  masking  ability  whether  the  task  is  used  to  determine  filter 
bandwidth  or  temporal  window  shape. 

Relationship  with  neural  synchrony 

Next  we  consider  our  results  for  fc  above  2  kHz.  The  original 
subjects'  discrimination  thresholds  at  4  kHz  are  twice  those  at  2  kHz  for 
the  2  Hz/msec  transition  rate.  Our  quick  check  with  new  listeners  at  6 
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kHz  confirms  this  poorer  performance  at  higher  frequencies.  As  we  sug¬ 
gested  in  the  discussion  above,  we  cannot  rule  out  frequency  discrimina¬ 
tion,  rather  than  temporal  acuity,  as  the  controlling  factor  for  fc  above  2 
kHz. 

This  poor  high  frequency  performance  contrasts  with  Plack  and 
Moore's  results  and  with  results  obtained  in  gap  detection  studies  using 
deterministic  signals  (e.g.,  Formby  and  Forest,  1991).  We  are  tempted  to 
explain  this  difference  by  suggesting  that  our  listeners'  ability  to  perform 
our  discrimination  task  is  related  to  synchronization  of  the  auditory  nerve 
fiber  responses.  Since  Plack  and  Moore's  study  required  only  the  detec¬ 
tion  of  a  tone  in  a  temporal  gap  between  noise  maskers,  we  should  not 
expect  their  results  to  exhibit  a  dependence  on  synchronization.  In  most 
mammalian  ears,  synchrony  falters  above  2  kHz  and  is  completely  ab¬ 
sent  above  5  kHz  (Rose,  Brugge,  Anderson  and  Hind,  1967;  Anderson, 
Rose,  Hind  and  Brugge,  1971).  Also,  Sinex  and  Geisler  (1981)  have 
shown  that  the  integrity  of  temporal  coding  at  the  lowest  instantaneous 
frequencies  is  preserved  even  at  extremely  high  sweep  rates,  well  above 
those  used  for  the  linear  glide  stimuli  of  the  present  study.  They  studied 
single-unit  responses  for  transition  rates  from  .2  kHz/sec  for  fibers  with 
characteristic  frequencies,  CF,  below  3  kHz  and  from  2  kHz/sec  to  160 
kHz/sec  for  fibers  with  CF  above  3  kHz.  (Note  that  these  rates  are  re¬ 
ported  in  kHz/sec  while  we  used  Hz/msec,  making  them  numerically 
equivalent.)  Sinex  and  Geisler  used  a  tone  that  followed  a  trapezoidal 
frequency  trajectory  with  center  frequency  near  that  of  the  fiber's  CF. 
Displays  of  inter-stimulus  interval  histograms  show  that  temporal  dis¬ 
charge  patterns  for  the  units  tracked  instantaneous  frequency  up  to  2 
kHz.  The  display  is  characterized  as  "less  clear"  for  frequencies  above  2 
kHz.  The  neural  synchrony  data  suggest  that  the  mammalian  auditory 
system  is  capable  of  temporally  following  the  frequency  modulations  of 
the  STEP  -  GLIDE  signals  at  the  lower  frequencies  but  not  at  the  higher  fre¬ 
quencies.  These  findings  are  consistent  with  the  results  of  this  study. 

If  it  is  true  that  step  vs  glide  discrimination  is  dependent  upon  the 
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synchrony  of  auditory  nerve  fibers,  then  one  problem  remains  to  be  ex¬ 
plained.  In  the  4  kHz  results,  discrimination  improves  with  increased 
transition  rates.  The  just  discriminable  step  reaches  20  msec  at  2  Hz/ 
msec,  but  is  nearer  10  msec  at  8  Hz/msec.  If  discrimination  depends 
upon  the  ability  of  nerve  fibers  to  remain  phase-locked  to  the  frequency- 
modulated  signal,  we  expect  that  just  the  opposite  result  should  hold. 

Conclusions 

We  have  devised  a  means  of  determining  the  temporal  acuity  of 
the  human  auditory  system  using  frequency  modulated  signals.  These 
signals  have  some  characteristics  in  common  with  the  formant  transitions 
of  speech,  and  thus  may  be  useful  in  relating  psychoacoustic  perfor¬ 
mance  in  this  task  to  speech  processing. 

Listeners  with  normal  hearing  were  asked  to  discriminate  between 
sinusoids  that  were  modulated  linearly  over  a  brief  time  interval  from  sig¬ 
nals  covering  the  same  frequency  excursion  in  a  multiple-step  trajectory. 
When  the  glide  signal  and  the  step  signal  are  just  discriminable  (75% 
correct  in  2AFC),  we  assume  that  the  duration  of  a  single  step  is  just  less 
than  the  width  of  the  temporal  window. 

For  presentation  frequencies  from  250  Hz  through  2000  Hz,  our 
estimate  of  the  temporal  window  is  7  to  10  msec.  Presentation  level  had 
no  effect  on  results,  at  least  for  30  and  50  SL  We  were  cautious  in  our 
testing  above  50  SL  because  spectral  differences  between  test  signals 
might  confound  our  results. 

Above  2  kHz,  performance  was  poorer  in  the  task.  In  fact,  at  4  kHz 
we  cannot  rule  out  the  possibility  that  our  listeners  were  basing  their  de¬ 
cisions  on  just  discriminable  frequency  jumps,  rather  than  on  temporal 
differences.  A  follow  up  at  6  kHz  using  different  listeners  led  to  equivocal 
results. 

We  are  tempted  to  conclude  that  the  ability  of  our  listeners  to  dis¬ 
tinguish  between  glide  and  step  signals  at  2  kHz  and  below,  is  related  to 
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the  synchronization  of  single  unit  fibers  in  the  periphery.  Evidence  for 
this  conclusion  coi>ld  only  be  “circumstantiar  in  psychoacoustic  studies. 

A  physiological  test  in  laboratory  animals  would  likely  be  required  to  pro¬ 
vide  support  for  this  speculation. 
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Figure  Captions 

Figure  1 .  Simulation  of  auditory  filter  bank  response  to  the  frequency- 
modulated  signals  used  in  the  present  experiment.  Each  sinusoid 
traverses  400  Hz  in  100  msec.  Center  frequency  is  1000  Hz.  The  top 
panel  shows  the  response  for  the  linear  glide.  The  next  panel  shows  the 
response  for  a  two-step  transition.  Each  succeeding  panel  shows  the 
response  for  3-,  5-.  9-  and  1 7-step  transitions. 

Figure  2.  Each  panel  displays  averaged  performance  for  the  four  listen¬ 
ers  as  psychometric  functions.  The  ordinate  shows  percent  correct  in  the 
2Q,  2AFC  task.  The  abscissa  is  the  duration  of  an  individual  step  for  the 
multiple-step  transition.  Each  function  within  a  panel  represents  perfor¬ 
mance  for  GLIDE  vs  STEP  signals  of  different  durations  with  common  indi¬ 
vidual  step  durations.  Signal  duration  is  indicated  by  symbol  type:  open 
circles  =  100  msec,  filled  circles  =  50  msec  and  triangles  =  25  msec. 

Width  of  frequency  transition  is  shown  by  line  type.  Solid  lines  indicate 
AF  =  400  Hz,  medium  dashed  lines  =  200  Hz  and  dotted  lines  =  100  Hz. 
The  top  panel  contains  psychometric  functions  for  a  2  Hz/msec  transition 
rate.  The  middle  panel  shows  performance  for  4  Hz/msec;  the  bottom 
panel  displays  8  Hz/msec  performance. 

Figure  3.  Discrimination  thresholds  obtained  from  averaged  psychomet¬ 
ric  functions  for  signal  frequencies  from  250  Hz  to  6  kHz.  The  threshold 
is  defined  as  the  step  duration  that  would  lead  to  75%  correct  discrimina¬ 
tions  in  the  2Q,  2AFC  task.  Circles  represent  transition  threshold  for  the  2 
Hz/msec  transition  rate,  triangles  represent  4  Hz/msec,  and  the  squares 
represent  8  Hz/msec.  The  original  four  listeners  are  represented  by  filled 
symbols  for  frequencies  up  to  4  kHz.  A  different  group  of  four  listeners, 
tested  several  months  later,  are  represented  by  open  symbols  at  6  kHz. 

Figure  4.  The  effect  of  presentation  level  on  glide  vs  step  discrimination 
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thresholds.  At  each  presentation  level,  the  bars  are  coded  to  reflect 
transition  rate.  Only  one  rate  was  tested  at  70  dB  SL. 
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ABSTRACT 


The  intensity  weighted  average  of  instantaneous  frequency  (IWAIF)  is  developed  as  a 
model  to  predict  listener  performance  in  tasks  primarily  requiring  frequency  discrim¬ 
ination.  IWAIF  is  closely  related  to  the  EWAIF  model  proposed  by  Feth  for  similar 
tasks.  The  primary  difference  is  that  the  IWAIF  model  uses  intensity  (envelope- 
squared)  as  the  weighting  function  instead  of  the  envelope.  The  advantages  of  IWAIF 
over  EWAIF  are  that  (a)  it  has  a  convenient  frequency  domain  interpretation;  and 
(b)  it  is  much  simpler  to  compute  than  the  EWAIF. 

PACS  numbers:  43.66. Ba,  43.66. Fe 
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INTRODUCTION 


The  envelope  weighted  average  of  instantaneous  frequency  (EWAIF)  model  was  de¬ 
veloped  nearly  two  decades  ago  by  Feth  (1974)  to  account  for  the  discriminability  of 
two- tone  complexes.  Helmholtz  (1954)  reported  that  the  pitch  of  a  two-component 
complex  tone  is  shifted  towards  the  frequency  of  the  component  whose  amplitude  is 
increased  slightly.  Helmholtz  attributed  the  pitch  shift  to  fluctuations  in  the  instan¬ 
taneous  frequency  of  the  two-tone  complex.  Feth  and  coworkers  (Feth,  1974;  Feth 
and  O'Malley,  1977;  Feth  et  al.,  1982)  have  studied  the  discriminability  of  comple¬ 
mentary  pairs  of  two-tone  complexes  (Voelcker,  1966a,  b).  Feth  showed  that  the  pitch 
differences  are  proportional  to  the  EWAIF  differences  between  the  complex  signals. 
Since  then  the  EWAIF  model  has  been  used  to  explain  a  variety  of  discrimination 
tasks  where  the  pitch  of  the  stimulus  is  the  dominant  cue.  For  example,  Feth  and 
Stover  (1987)  extended  the  model  to  explain  an  anamoly  in  data  relating  to  “profile 
signals”  (Green,  1988).  The  central  theme  of  this  model  is  that  for  certain  signal  pairs, 
listeners  use  pitch  differences  to  discriminate  between  them.  Feth’s  model  attempts 
to  quantify  the  pitch  changes  observable  in  the  discrimination  of  complex  stimuli.  In 
the  Ccise  of  profile  signals  it  is  assumed  that  changes  in  spectral  shape  of  the  profile 
signals  produce  a  noticeable  change  in  the  perceived  pitch. 

Computing  the  EWAIF  of  a  signal  is  sometimes  difficult,  especially  for  wideband 
signals.  One  problem  that  arises  is  due  to  the  fact  that  the  derivative  of  signals  have 
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to  computed  in  order  to  arrive  at  the  EWAIF.  Differentiation  is  a  highly  noise  sensi¬ 
tive  operation  which  may  lead  to  incorrect  values  of  the  EWAIF.  Also,  it  is  sometimes 
necessary  to  compute  the  ratio  of  two  numbers  that  are  nearly  zero.  This  may  not  be 
possible  on  computers  because  of  the  word  length  being  finite.  It  has  been  reported 
that  other  weighting  functions  such  as  intensity  (square  of  the  envelope)  perform 
equally  well  in  predicting  pitch  differences  (Feth  et  al.,  1982).  Anantharaman  et  al. 
(1991)  used  such  a  intensity  weighted  average  of  instantaneous  frequency  (IWAIF) 
model  to  predict  frequency  differences.  In  this  paper,  we  shal’  further  investigate 
the  IWAIF  model  in  terms  of  its  computational  difficulty  and  its  capability  to  pre¬ 
dict  pitch  differences.  It  is  found  that  the  IWAIF  model  is  easier  to  compute  than 
the  EWAIF  model  and  is  also  much  faster.  The  IWAIF  model  does  not  have  the 
drawbacks  of  the  EW'AIF  model  mentioned  above. 

First,  the  time  and  frequency  domain  representations  of  the  EWAIF  is  presented. 
The  IWAIF  of  a  signal  is  then  defined,  and  its  representation  in  the  frequency  domain 
is  derived.  The  performance  of  the  IWAIF  model  is  then  compared  to  that  of  the 
EWAIF  model  in  a  number  of  psychoacoustic  tasks. 
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I.  EWAIF  MODEL 


A.  EWAIF  in  the  time  domain 

In  general,  a  finite  energy  real  signal  s{t)  which  has  a  Fourier  transform 

s(/) = = r  “(')  dt  (1) 

oo 

can  be  represented  as  (McGillem,  1979;  Voelcker,  1966a,  b). 


3{t)  =  e{t)  cos(f>{t) 

0  <  /  <  r 

(2) 

=  Re*  Je(0  eJ>(0 

(3) 

where  e(0  is  the  instantaneous  envelope  and  <^(<)  is  the  instantaneous  phcise.  The 
instantaneous  frequency,  /(<)  is  defined  as. 


m  ^  ^ 


Such  a  representation  of  s{t)  is  not  unique.  For  example,  e{t)  can  be  chosen  to 
satisfy  (.3)  for  an  arbitrary  (f){t).  A  unique  e{t)  and  4>{t)  can  be  assured  by  imposing 
an  additional  constraint,  namely,  that  the  real  and  imaginary  parts  of  the  complex 
signal  e(t)  form  a  Hilbert  transform  pair.  Such  a  complex  signal  is  termed 
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analytic  and  heis  certain  useful  properties.  Thus,  the  analytic  signal  corresponding  to 
the  real  signal  s(t)  can  be  written  as 


m(t)  =  s(t)  +  j  s(t) 


where 


s(t}  =  the  Hilbert  transform  of  s(t). 


The  envelope  and  instantaneous  frequency  functions,  e(t)  and  /(t),  can  be  defined 
in  terms  s(i)  and  i(0  as 

e(0  =  |m(OI  =  [5^(0 +  5^(0]*  (8) 

r  s(i) 

<t>{t)  =  arctan  —  (9) 

,{t)  At)  -  At)  i(t)  . 

'  »'(()  + i=(i)  '  ' 

The  envelope  weighted  average  of  instantaneous  frequency  (EWAIF)  of  s{t)  is  defined 


EWAIF[5(0]  = 


Jo<t)dt 
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A  common  method  of  calculating  the  EWAIF  of  a  signal  is  to  determine  the 
envelope  and  instantaneous  frequency  functions  using  (8), (10)  and  computing  the 
required  integrals  in  (11).  However,  there  are  some  computational  problems  when 
we  adopt  this  method  for  calculating  the  EWAIF  of  broadband  signals.  Note  that 
the  expression  (10)  for  f{t)  involves  differentiation  which  is  a  highly  noise  sensitive 
operation. 


B.  Frequency  domain  representation  of  EWAIF 

.Alternatively  f{t)  can  be  expressed  in  terms  of  the  analytic  signal,  m(t),  alone  by 
rewriting  (6)  as 

In  m(/)  =  In  |m(0| ^^(O  (12) 


Hence. 


(p{t)  =  Im[lnm(<)];  Im  denotes  the  imaginary  part  operator 


fW  = 


">'(<) 


m(0j 


Inserting  the  above  equations  in  the  expression  for  EWAIF  (11)  we  have 


EWAIF[s(0]  = 


1 


fo  MO  I 


mill 

m(t) 


dt 


2^ 


(13) 

(14) 


(15) 
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This  can  be  expressed  in  terms  of  the  Fourier  transform  of  yjm^t)  as  (see  Appendix 
A  for  the  derivation) 


EWAIF[s(t)]  =  2 


/-”  /|M5(/)Pd/ 

r«,lMs(/)Pd/ 


(16) 


where  A/s(/)  =  T  [ym(<)J. 

The  EWAIF  is  thus  the  frequency  of  the  “center  of  gravity”  of  \Ms{f)\^-  While  this 
is  an  interesting  observation,  it  is  of  little  use  in  the  computation  of  the  EWAIF  of  a 
signal.  Indeed,  in  order  to  obtain  A/s(/),  the  square  root  of  a  complex  signal  heis  to  be 
computed.  In  computing  yjm{t),  we  have  to  be  careful  to  choose  the  principal  branch 
of  the  square  root.  This  is  similar  to  the  phase  unwrapping  problem  encountered  in 
signal  processing.  Further,  because  f{t}  has  an  e(<)  term  in  the  denominator,  care 
must  be  taken  in  computing  the  instantaneous  frequency  at  points  where  the  envelope 
is  zero  or  near  zero.  This  involves  computing  a  limit  of  the  ratio  of  two  functions 
which  approach  zero  rather  than  a  simple  division. 
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II.  IWAIF 


A.  IWAIF  in  the  time  domain 

In  computing  the  EWAIF,  the  envelope  of  the  signal  is  used  as  the  weighting  function 
for  finding  the  average  of  the  instantaneous  frequency.  Other  weighting  functions  may 
model  listener  discriminability  as  well.  Indeed,  Feth  et  al.  (1982)  observe, 

Our  previous  modeling  of  the  discriminability  of  two-component  complex  tones 
has  concentrated  on  the  envelope-weighted  arithmetic  average  of  the  instan¬ 
taneous  frequency  fluctuation  in  each  complex  signal.  For  these  results  we 
investigated  the  predictions  of  another  weighting  function,  the  instantaneous 
intensity  (envelope-squared).  Also,  we  calculated  the  root-mean-square  (rms) 
average  of  the  instantaneous  frequency,  and  the  rms  of  the  envelope-weighted 
instantaneous  frequency. 

In  all  cases,  the  relations  among  the  three  model  predictions  do  not  vary  with 
respect  to  one  another.  The  prediction  of  the  envelope-weighted  arithmetic 
average  model  lies  above  that  of  the  rms  of  envelope- weighted  arithmetic  av¬ 
erage  model,  which  in  turn  is  above  the  intensity-weighted  arithmetic  average 
model.  The  differences  among  the  predictions  for  these  three  models  are  so 
small  that  similar  duplication  was  avoided  in  plotting  the  comparisons  for  sub¬ 
jects  2  through  4.  These  small  differences  among  the  three  models  can  be  in- 
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terpreted  as  support  for  the  weighted  time  average  of  instantaneous  frequency 
models  in  general.  They  indicate  an  insensitivity  to  the  choice  of  the  actual 
weighting  function  (envelope  or  intensity)  and  an  insensitivity  to  the  choice  of 
the  averaging  (arithmetic  versus  rms). 


Let  us  investigate  the  intensity  weighted  (arithmetic)  average  of  instantaneous 


frequency  (IVVAIF)  of  a  signal.  The  IWAIF  of  s{t)  is  defined  zis 


IWAIF[s(<)] 


!o  t\i)dt 


(17) 


B.  Frequency  domain  representation  of  IWAIF 

Much  of  the  discussion  in  this  section  follows  that  in  Anantharaman  (1992).  We  can 
rewrite  (17)  in  terms  of  m{t)  as 


IWAIF[s(0l 


m'[t) 

1^(0 


dt 


fo  \m{t)\‘^dt 


(18) 


Invoking  Parseval’s  relation  this  becomes  (see  Appendix) 


IWAIF[s(0] 


f  |M(/)P  df 


(19) 
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This  can  be  further  simplified  by  taking  advantage  of  the  one-sided  nature  of  M(/) 
and  its  relation  to  5{/).  Equation  (19)  then  becomes 


IVVATFf  rom 

IWAIF[s(0]  /~15(/)|2d/ 

Thus,  the  IVVAIF  of  a  real  signal  is  located  exactly  at  the  “center  of  gravity”  of  the 
positive  portion  of  its  energy  density  spectrum.  Compare  this  with  (16)  which  is  the 
frequency  domain  expression  for  the  EWAIF.  Computing  the  IWAIF  using  relation 
(20)  involves  simply  taking  the  Fourier  transform  of  the  signal.  Using  the  Fast  Fou’  ,er 
Transform  (FFT)  algorithm  this  can  be  done  ecisily. 

.An  alternative  derivation  for  the  hVAIF  is  to  find  a  suitable  fo  such  that  s{t) 
represents  a  modulated  wave  of  the  form 

s(t)  =  e{t)ros[2wfot  +  e{i)]  (21) 

=  Re[t/’(0]  (22) 


e{t)  is  thought  of  as  the  envelope  of  s(/)  and  9{t)  as  its  phase.  For  narrow-band  e(<) 
and  9{t)  this  represents  the  modulation  of  a  sinusoidal  carrier  wave  of  frequency  /q. 
The  instantaneous  frequency  of  the  signal  is 


1  d9it) 


(23) 
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The  choice  of  /o  ^an  be  arbitrary  so  long  as  the  mathematical  relations  remain  valid. 
The  most  common  choice  (McGillem,  1979)  is  to  select  fo  such  that  it  is  the  center  of 
gravity  of  |^(/)|^.  This  corresponds  to  the  center  of  gravity  of  the  positive  frequency 
portion  of  the  energy  density  spectrum  of  the  signal.  The  required  value  for  /o  is  that 
value  which  minimizes  the  following  integral 

rif-formf)Nf  (24) 

JO 

which  is  ihe  same  as  the  IW.MF  of  the  signal  s{t). 

C.  Computation  of  IWAIF 

1  he  above  frequency  domain  representation  (20)  provides  a  simple  and  efficient  pro¬ 
cedure  for  computing  the  IW'.AIF  of  a  signal.  It  eliminates  most  of  the  difficulties 
encountered  in  computing  the  EWAIF  of  the  signal.  Moreover,  the  IWAIF  is  com¬ 
pletely  described  by  the  energy  spectrum  of  the  signal  alone.  This  obviates  the  need 
to  compute  a  Hilbert  transform  and  a  derivative.  All  that  needs  to  be  computed  is 
the  Fourier  transform  of  s{t).  This  can  be  done  efficiently  using  the  FFT  algorithm. 

Suppose  s{t)  is  sampled  at  a  rate  F,  to  yield  N  samples,  s[n],n  =  0,1 _ ,N  —  1. 

Its  .V-point  FFT  is  5[/r],/r  =  0. 1, . . . ,  .V  —  I,  say.  Then,  the  IWWIF  of  s{t)  can  be 
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computed  as: 


IWAIF[s(0] 


Etc'  |5(*:)l^  A/ 

.,EKiJW 

’  “S21w“ 


(25) 

(26) 


where  Af  =  Fg/N  is  the  frequency  spacing  between  samples  of  the  FFT. 


III.  COMPARISON  OF  IWAIF  AND  EWAIF 

As  mentioned  earlier,  the  only  difference  between  the  IWAIF  of  a  signal  and  its 
EWAIF  is  in  the  choice  of  a  weighting  function.  While  the  envelope  is  used  to 
weight  the  instantaneous  frequency  in  calculating  the  EWAIF,  the  intensity  (envelope- 
squared)  is  used  as  the  weighting  function  in  IWAIF  calculations.  Since  both  envelope 
and  intensity  are  non-negative  and  the  latter  is  the  square  of  the  former,  the  weighting 
functions  are  highly  correlated.  Thus,  similar  values  are  expected  for  the  EWAIF  and 
IWAIF  of  a  signal.  For  a  simple  sinusoid,  both  the  EWAIF  and  the  IWAIF  values  are 
equal  to  the  tone  frequency  /q.  For  a  combination  of  two  tones  of  the  same  amplitude 
the  EWAIF  and  IWAIF  values  are  again  equal  and  are  located  at  the  mean  of  the 
two  frequencies.  It  is  difficult  to  analytically  calculate  the  EWAIF  of  a  combination 
of  tones.  However,  the  IWAIF  of  an  N-component  complex  can  be  easily  calculated. 
Assuming  T  to  be  much  larger  than  the  maximum  of  all  the  tone  periods,  the  IW.AIF 
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of  a  sum  of  sinusoids  such  as 


s{t)  =  ^ai  cos{2n  fit)  0<t<T  (27) 

t 

is  approximately  equal  to  the  weighted  mean 

IWAIF[s{0)  =  (28) 

The  IWAIF  model  was  applied  to  stimuli  in  some  of  the  experiments  for  which 
the  EW’AIF  model  was  able  to  predict  the  pitch  differences.  Table  I  shows  the  corre¬ 
sponding  EWAIF  and  IWAIF  values  for  a  complementary  Voelcker  signal  pair.  The 
time-domain  representation  (11)  was  used  to  calculate  the  EWAIF  and  frequency  do¬ 
main  representation  (20)  was  used  to  calculate  the  IWAIF  values.  As  noted  in  Feth 
et  al.  (1982).  IW.MF  differences  are  slightly  smaller  that  EWAIF  differences. 

Iwamiya  et  al.  (Iwamiya  et  al.,  1983)  studied  the  location  of  the  principal  pitch 
of  FM-AM  tones.  These  vibrato  tones  are  generated  by  modulating  a  carrier  both  in 
frequency  and  amplitude  with  the  same  modulating  signal.  The  stimuli  used  in  these 
particular  set  of  experiments  consisted  of  a  sinusoidal  carrier  at  frequencies  440,  880, 
and  1-500  Hz  modulated  by  a  triangular  wave  of  frequency  6  Hz.  Thus,  if  Dam  is  the 
"degree  of  AM”  and  EfM  is  the  “extent  of  FM”,  the  modulated  signal  for  a  carrier 
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frequency  is  given  by: 


s{t)  =  [I  +  Dam  m{i)]  cos[2iTf ct  +  0.5  Efm  f  m(r)(fr]  (29) 

Listeners  were  aisked  to  match  the  pitch  of  these  modulated  tones  to  a  pure  tone.  The 
experiment  was  conducted  for  two  cases.  In  the  first  ca^e  the  degree  of  AM  was  set 
at  unity  and  the  extent  of  FM  had  values  0,  25,  50  and  100  cents.  In  the  second  case 
the  extent  of  FM  was  a  constant  at  100  cents  and  the  degree  of  AM  had  values  0.00, 
0.50.  0.75  and  1.00.  For  each  case,  two  sets  of  data  were  collected  with  the  frequency 
and  amplitude  modulations  in-phase  and  anti-phase  respectively.  EWAIF  and  IWAIF 
values  were  calculated  for  the  case  when  the  degree  of  AM  is  a  constant.  The  results 
are  shown  in  Fig.  1  for  two  carrier  frequencies,  440  and  880  Hz.  The  curves  with 
a  positive  slope  represent  FM  and  AM  in-phase  while  those  with  a  negative  slope 
have  their  frequency  and  amplitude  modulations  180°  out  of  phase.  IWAIF  values 
are  plotted  along  the  dashed  line  while  the  dotted  line  corresponds  to  the  EWAIF 
values.  The  regression  equations  for  the  localized  principal  pitch  calculated  by  the 
authors  is  also  shown  ais  the  solid  line.  It  is  clear  from  the  figure  that  the  weighted 
averages  are  reasonably  close  to  the  principal  pitch  values.  Here,  IWAIF  differences 
are  slightly  larger  than  EWAIF  differences. 

Similar  calculations  were  carried  out  for  profile  signals  (Green,  1988;  Green  et  al.. 
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1984).  The  standard  consists  of  logarithmically  spaced  components  drawn  from  a 
set  of  eleven  frequencies  ranging  from  200  to  5000  Hz.  Equal  number  of  components 
are  present  on  either  side  of  the  signal  which  is  always  at  1000  Hz.  Thus,  the  three- 
component  signal  would  have  components  ranging  from  724  to  1380  Hz  while  the 
eleven-component  complex  would  span  the  whole  range  from  200  to  5000  Hz.  The 
results  are  shown  in  Fig.  2.  The  average  threshold  for  a  signal  added  to  the  middle 
component(1000  Hz)  of  the  complex  is  plotted  eis  a  function  of  the  number  of  com¬ 
ponents  in  the  complex.  The  EWAIF  and  IWAIF  values  are  superimposed  on  this 
graph  with  the  solid  line  denoting  differences  in  IWAIF  values  and  the  broken  line 
denoting  differences  in  EWAIF  values  as  a  percentage.  Again,  the  difference  between 
EWAIF  and  IWAIF  values  is  very  small. 

IV.  CONCLUSIONS 

The  intensity- weighted  average  of  instantaneous  frequency  (IWAIF)  model  heis  been 
presented  as  an  alternative  to  the  envelope-weighted  average  of  instantaneous  fre¬ 
quency  (EWAIF)  model.  Calculation  of  the  EWAIF  of  a  signal  involves  determining 
the  envelope  and  the  instantaneous  frequency  functions  of  the  signal  separately.  This 
can  be  computationally  cumbersome  especially  ets  the  bandwidth  of  the  signal  gets 
wider.  The  IW  AIF  of  a  signal,  on  the  other  hand,  can  be  expressed  solely  in  terms 
of  the  magnitude  spectrum  of  the  signal.  Such  a  frequency  domain  representation 
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provides  a  fast  and  efficient  method  to  compute  the  IWAIF  of  a  signal  using  the  FFT 
algorithm. 

The  IWAIF  model  was  tested  on  three  sets  of  stimuli  viz.  Voelcker’s  complemen¬ 
tary  two-tone  complexes  used  in  experiments  by  Feth  and  co-workers  (Feth,  1974;  Feth 
et  al..  1982;  Feth  and  O'Malley,  1977),  FM-AM  tones  used  by  Iw'amiyaet  al.  (Iwamiya 
et  al..  1983),  and  profile  signals  used  by  Green  et  al.  (Green  et  al.,  1984).  The  per¬ 
formance  of  the  IW.-MF  model  was  found  to  be  comparable  to  that  of  the  EWAIF 
model. 
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APPENDIX  A:  EWAIF 


The  frequency  domain  representation  of  the  EWAIF  can  be  derived  as  follows.  The 
EW.AIF  of  s(t)  can  be  written  as 


EWAIF  = 


1 


ij  |m(t)|  Im 


m 


[m{t) 


dt 


2^  fo 


(Al) 


Consider  the  numerator. 


(A2) 

(A3) 


where  m'(t)  denotes  the  complex  conjugate  of  m{t) 


—  Imj  dt  (A5) 


Applying  the  theorem  for  the  Fourier  transform  of  the  derivative  of  a  signal  and 
invoking  Parseval’s  theorem,  the  numerator  can  be  expressed  as 


TT 


j2irf  Msif)  Msif)  df 


(A6) 
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r 


=  2 /“  /  |Ms(/)|=  d/ 

^-OO 


(A7) 


is  the  Fourier  transform. 


Similarly,  the  denominator  can  be  expressed  as 


Hence 


T  r  T  * 

jf  dt 

r  \Ms{f)\^  df 

y— oo 


EWAIF  = 


jr^\Ms{f)\^  df 


APPENDIX  B:  IWAIF 


(A8) 

(A9) 


(AlO) 


IWAIF  can  be  varously  expressed  in  terms  of  s{t)  and  s(<).  Using  (17)  the  IW.\IF  of 
s{t)  can  be  written  as 


IWAIF 


/J'[s(f)j-(t)-s'(t)s(0]dt 
fo  dt 

f^s{t)s'{t)dt 
fJsHt)dt 
fj  s'jt)  s{t)  dt 
fo  5^(0  dt 


(Bl) 

(B2) 

(B3) 
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The  last  two  relations  were  obtained  by  noting  that  the  integral  of  the  two  terms  in 
(Bl)  are  equal. 

In  order  to  derive  the  frequency  domain  representation  of  IWAIF  consider 


IVVAIF  = 


So 


Lm(0 


dt 


(B4) 


In  the  frequency  domain  the  numerator  can  be  expressed  as 


m'(i) 


m{i) 


dt 


~\m  m(t)m-(i) 


1  r'^ 

— Im  /  Tn'{t)  m’{t)  dt 
27r  Jo 


=  r/l-W(/)Prf/ 

J^x> 


(B5) 

(B6) 

(B7) 

(B8) 


The  expression  for  the  Fourier  transform  of  the  differential  of  a  signal  as  well  a.s  Par- 
seval's  relation  were  made  use  of  in  the  foregoing  simplification.  Again,  by  Parsevals 
relation,  the  denominator  is 


f  e^{t)  dt  =  f  \m{t)\^  dt 
Jo  Jo 

-  r  df 

oo 


(B9) 

(BIO) 
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Hence 
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IWAIF  = 


/So  /  |M(/)P  df 

/SolM(/)N/ 


(Bll) 
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FOOTNOTES 


*  Re  denotes  the  real  part  operator 
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l.ix;ali/ed  principal  pitch(P),  RWAIF,  IWAIF  (cents) 


20 


Carrier  frequency  -  440  Hz 


EWAIF 

-20 - ^ - - - - 

0  25  50  100 

Extent  of  P.1  (cents) 

Constant  AM  ( 1 .00) 

FIG.  1(a) 


J.  Accousl.  Soc.  Am. 


25 


Ananthararp'n,  Krishnamuthy  and  Feth:  IWAIF 


lx)calizcd  principal  pilch(P).  UWAIF,  IWAIF  (cents) 


Carrier  frequency  -  880  Hz 
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FIG.  1:  Localized  principal  pitch,  EWAIF  and  IWAIF  of  FM-AM  tones  as  a  function 
of  the  extent  of  FM  with  the  frequency  and  amplitude  modulations  both  in-phase 
(lines  with  positive  slope)  and  anti-phase  (lines  with  negative  slope)  (Iwamiyaet  al., 
1983).  (a)  440  Hz  carrier  frequency  (b)  880  Hz  carrier  frequency 
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FIG.  2:  EWAIF  and  IWAIF  values  calculated  for  profile  signals  (Green  et  al.,  1984). 
EWAIF  values  are  plotted  as  a  dashed  line  symbols)  while  IWAIF  values  are 
plotted  as  a  solid  line  symbols).  The  increment  threshold  for  multi-compcrent 
profile  signals  is  also  shown  as  a  solid  line  {‘o’  symbols). 
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IWAIF-EWAIF  DIFFERENCE  (%) 


TABLE  1:  EWTAF  and  IWAIF  values  for  complementary  Voelcker  signal  pairs. 


Signal 

EWAIF 

IWAIF 

(1000  Hz,  71  dB) 
(1020  Hz,  70  dB) 

1007.59 

1008.82 

(1000  Hz,  70  dB) 
(1020  Hz,  71  dB) 

1012.41 

1011.18 
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A  set  of  TMS320  assembly  programs  have  been  implemented 
to  generate  different  stimuli  used  in  psychoacoustic 
experiments.  GENIONE.ASM  generates  single  frequency  pure 
tones,  step  tones  and  glide  signals.  MULTI. ASM  can  be 
used  to  generate  multi-component  signals.  Finally, 
CLICKS. ASM  can  generate  two  click  sequences  with  a 
specified  delay  time  between  them.  By  using  these  TMS320 
programs,  stimuli  can  be  generated  in  real  time  (in 
GENTONE.ASM  and  CLICKS. ASM)  or  much  faster  than  the 
conventional  microcomputers  (in  MULTI. ASM).  The  TMS320 
programs  can  be  invoked  by  several  high  level  programming 
languages  including  C,  PASCAL,  FORTRAN  and  BASIC.  All 
these  programs  have  been  tested  on  the  Ariel  DSP-16  board 
installed  in  the  IBM  PC  compatible  computers. 


Introduction 
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Generating  accurate  and  stable  stimuli  is  usually 
important  in  many  hearing  experiments.  Since 
microcomputers  have  been  widely  used  to  control  and 
monitor  laboratory  events,  it  is  very  convenient  to 
generate  or  digitize  analog  signals  by  using  a  digital- 
to-analog  converter  (DAC)  and  analog-to-digital  converter 
(ADC)  . 

Because  multi-listener  adaptive  paradigms  and/or 
roving  frequency  paradigms  are  included  in  many  hearing 
experiments,  it  is  often  necessary  to  generate  "real 
time"  stimuli  rather  than  stored  signals.  Using  stored 
signals  will  require  disk  I/O  time.  This  longer  time 
delay  often  can  not  satisfy  the  requirements  of  many 
experiments  which  use  multiple  listener,  adaptive- 
tracking,  roving-frequency  procedures.  In  addition, 
stored  signals  usually  consume  lots  of  disk  spaces. 

This  report  describes  a  set  of  programs  which  can 
generate  signals  in  real-time  for  use  with  IBM  compatible 
microcomputers.  The  programs  discussed  in  this  report  are 
implemented  in  TMS320  assembly  language  (Texas 
Instruments  Incorporated,  1987)  and  can  run  independently 
on  the  Ariel  DSP-16  board  (Ariel  corporation,  1987).  The 
TMS320  assembly  programs  are  very  fast  and  can 
communicate  with  many  high  level  languages.  Therefore, 
they  provide  a  powerful  utility  for  generating  signals  in 
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"real  time".  In  this  report,  we  will  introduce  several 
TMS320  programs  which  generate  pure  tones,  step  tones, 
glides,  multi-component  signals,  and  clicks. 

The  Ariel  DSP-16  provides  a  very  good  hardware  and 
software  environment  for  developing  our  application 
programs.  The  'ssident  monitor  (RESMON)  is  a  very  useful 
tool  for  writing  a  TMS320  application  program.  RESMON  is 
used  to  ease  the  task  of  constructing  all  the  programs 
described  in  this  article.  It  is  always  assembled  to  fill 
1024  words  (from  the  address  0  to  1023)  .  For  more 
information  about  the  DSP-16  and  RESMON,  please  refer  to 
the  operating  manual  (Ariel  Corporation,  1987)  . 

All  of  the  programs  described  in  this  report  have 
been  used  with  TURBO  PASCAL  (Borland  International,  1989) 
in  our  laboratory  to  control  experiments  such  as 
"frequency  discrimination"  and  "complex  sound 
discrimination" . 

Pure  tone,  step  tone  and  glide  signals 

To  generate  single  frequency  tones,  step  tones  or  glides, 
we  use  the  Table  Lookup  method.  A  one-cycle  sine  wave 
(with  frequency  f  =  1,  at  a  sample  rate  of  R  =  50000 
samples  per  second)  is  saved  in  the  data  buffer  having  L 
=  50,000  memory  locations.  In  order  to  convert  digital 
samples,  the  TMS320  program  has  an  address  variable  that 
points  to  the  particular  memory  location  that  is 
converted  to  analog  form  at  a  given  instant.  After  each 
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sample  conversion,  the  address  variable  is  incremented  by 
A  to  point  to  the  next  sample  in  the  data  buffer.  When 
the  address  variable  (Addr)  is  larger  than  49,999,  the 
address  variable  is  reset  to  the  offset  value  (Addr  - 
50000)  .  Therefore,  a  waveform  is  generated  by  the 
continuous  recycling  of  the  sampled  sine  data.  If  we  set 
the  sample  rate  of  the  DSP-16  to  50  KHz,  the  output 
frequency  f'  is  equal  to  A.  The  frequency  resolution  is, 
therefore,  1  Hz. 

The  TMS32025  program  used  for  generating  different 
tones  is  called  GENTONE.ASM.  This  program  can  generate 
steady  tones,  step  tones  and  glides.  If  we  keep  the 
increment  value  A  fixed,  the  output  will  be  a  single 
frequency  signal  (steady  tone) .  However,  if  we  increase 
or  decrease  A  linearly  (the  increment  or  decrement  of  A 
is  equal  to  one  for  each  step)  ,  the  output  will  be  a 
glide  signal  with  rising  or  descending  frequency.  For 
example,  if  A  changes  smoothly  from  400  to  600,  a  glide 
tone  starting  from  400  Hz,  ending  at  600  Hz  is  generated. 
In  addition,  if  the  increment  of  A  is  a  fixed  number 
rather  than  one  for  each  step,  a  step  tone  is  generated. 


Insert  Figure  1  about  here 


Figure  1  shows  an  example  of  these  three  types  of 
tones  (duration  =  120  ms) :  a  400  Hz  steady  tone,  a  glide 
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from  900  Hz  to  1140  Hz  having  a  slope  of  2  Hz  per  milli¬ 
second,  and  a  step  tone  from  600  to  840  Hz  having  seven 
steps  and  the  same  slope  as  that  of  the  glide.  Note  that 
the  step  length  (in  ms)  of  the  first  and  last  steps 
(short  steps)  is  the  half  of  the  step  length  of  the 
"middle"  steps  (long  steps) .  This  definition  is 
necessary,  so  that  the  slope  (see  the  dot  line  in  Figure 
1)  can  be  represented  correctly.  Also,  the  glide  tone  is 
a  special  case  of  step  tones  with  (frequency  excursion  + 
1)  steps  and  a  step  height  of  1  Hz. 


Insert  Figure  2  about  here 


Often,  the  output  of  the  DSP-16  is  passed  through  an 
analog  gate.  Without  the  gating,  the  waveform  will 
produce  an  abrupt  amplitude  transition  at  the  signal 
onset  and  offset.  These  are  often  heard  as  clicks.  A 
software  gating  mechanism  is  incorporated  into 
GENTONE.ASM  so  that  external  gating  is  not  necessary.  The 
software  gate  is  defined  by  a  gating  envelope  (see  Figure 
2) .  The  output  waveform  is  then  multiplied  by  the  gating 
envelope.  Figure  2  shows  that  a  sine  wave  is  gated  by  a 
cosine  squared  gating  envelope.  A  parameter  in  the 
program  can  be  used  to  determine  if  the  hardware  gates 
are  available  or  the  software  gate  will  be  used.  By  using 
the  software  gate,  the  program  can  generate  signals 
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having  smooth  envelopes.  However,  the  gating  envelope 
must  be  loaded  into  the  data  buffer  before  the  signals 
can  be  generated. 

The  first  part  of  GENTONE.ASM  is  the  RESMON  monitor 
supplied  by  Ariel  company  and  the  second  part  is  the  user 
function,  which  starts  at  the  address  1024.  One  user 
function  (function  #0)  is  defined  in  GENTONE.ASM.  In 
order  to  use  the  program,  the  output  sampling  rate  of  the 
DSP-16  should  be  set  to  50  KHz.  When  the  high  level 
program  calls  the  user  function  to  generate  signals,  ten 
parameters  have  to  be  sent.  The  parameters  are  described 
in  Table  1 . 


Insert  Table  1  about  here 


Note  that  when  the  sample  rate  is  50  KHz,  the 
parameter  step  length  defined  in  Table  1  can  be  written 
as 


step  length^ 


50xX)uzaCion 
step  number-1 
50 xVura Cl on 
2 (step  number-1) 


long  step 
short  step 


(1) 


where  Duration  is  the  signal  duration  in  milli-second. 
Furthermore,  step  length  usually  does  not  need  32  bits 
(16  bits  are  enough  for  65,535  samples).  However,  we  use 
32  bits  to  represent  step  length  for  the  case  of  very 
long  steps  and  to  give  more  flexibility  to  the  users. 
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Before  the  main  program  can  call  GENTONE.ASM,  a  sine 
table  has  to  be  loaded  into  the  DSP-16  data  buffer.  The 
sine  table  should  be  a  one-cycle  sine  wave  containing 
50,000  samples,  in  locations  from  0  to  49,  999  in  the  data 
buffer.  The  DSP-16  has  at  least  a  256K  data  buffer,  so 
the  sine  table  will  occupy  one-fifth  of  that  buffer. 
Also,  one  can  call  the  RESMON  command  CMDll  -  "Write  to 
buffer  RAM"  to  load  the  sine  table.  A  TURBO  PASCAL 
example  used  for  loading  the  sine  table  will  be  given 
latter.  If  the  software  gate  is  used,  the  gating  envelope 
should  be  loaded  into  the  data  buffer.  The  address  of  the 
gating  envelope  should  be  at  50,000.  The  length  of  the 
gating  envelope  is  the  same  as  that  of  the  signal.  Note 
that  the  "real"  length  of  the  envelope  should  be  computed 
by  using  the  long-step-length  and  the  short-step-length 

anvelopa  length-^  i  long  step  length)  xistep  nujnber-2)  *{short  step  length)  x2  (2) 

In  fact,  by  using  the  gating  envelope,  we  can  modify  the 
output  waveform  to  any  shape  of  envelope. 

After  loading  the  sine  table  and  the  gating  envelope 
(if  it  is  necessary),  one  can  issue  the  RESMON  command 
(CMD12)  to  invoke  the  user  function.  Since  10  parameters 
will  be  passed  to  the  function,  the  code  format  here  for 
CMD12  is  B280h.  Finally,  the  parameters  described  in 
Table  1  are  sent  to  the  DSP-16,  one  by  one.  When  the 
function  is  finished  it  sends  an  FFFFh  value  back  to  the 
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PC.  This  value  must  be  read  from  the  host  port  into  the 
PC  before  any  further  host  port  communication  can  be 
executed . 

GENTONE.ASM  has  been  included  in  many  TURBO  PASCAL 
programs,  which  control  the  experiments  designed  for  the 
study  of  frequency  discrimination  in  our  lab  (Neill, 
1990)  .  Because  the  stimuli  are  generated  in  real  time,  we 
can  include  adaptive  procedures  and  roving  level  methods 
in  the  experiments.  Also,  multi-subject  adaptive 
procedures  are  possible  because  of  the  high  speed  of  the 
TMS  programs.  This  can  save  a  lot  experiment  time  and  no 
disk  space  is  necessary  to  store  the  stimuli. 


Insert  Figure  3  about  here 


An  example  of  a  TURBO  PASCAL  program  segment  is 
shown  in  Figure  3.  Here  we  assume  the  TMS  program 
GENTONE.HEX  (machine  code  of  GENTONE.ASM)  has  been  loaded 
into  the  DSP-16' s  program/data  memory  and  all  the 
variables  are  declared  correctly.  The  example  consists  of 
two  parts,  the  first  part  shows  how  to  load  the  sine 
table  into  the  DSP-16' s  data  buffer  RAM,  and  the  second 
part  is  the  program  for  calling  the  "GENTONE  function". 
In  the  first  part  of  this  example,  the  sine  table  is  a 
binary  file  of  integers  in  IBM  PC  format.  Also,  the 
output  sampling  rate  has  to  be  initialized  to  50  KHz. 
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Multi-component  complex  tones 

A  complex  periodic  tone  is  a  signal  consisting  of 
multiple  components  each  with  different  frequencies  and 
amplitudes.  A  multi-component  signal  can  be  written  as 

m 

s(t)  =  Ab-'  cos  (2x  (F^+if^)  t)  .  (3) 

1-0 

The  line  spectrum  for  this  example  is  shown  in  Figure  4. 


Insert  Figure  4  about  here 


The  use  of  the  TMS320  to  generate  multi-component 
complexes  is  not  as  simple  as  that  described  for 
generating  pure  tone  and  glide  signals.  A  set  of  TMS320 
functions  has  been  developed  for  the  purpose  of 
generating  multi-component  signals.  This  set  of  functions 
is  described  in  Table  2.  Note  that  the  function 
''FIND_PEAK"  and  "NORMALIZE"  are  modified  from  the  program 
(SI  Data  Acquisition  Application)  incorporated  with  the 
DSP-16,  provided  by  Ariel  company  (Ariel  corporation, 
1987)  . 


Insert  Table  2  about  here 


All  the  functions,  except  "FIND_PEAK",  will  return  a 
number  valued  FFFFh  to  the  PC  when  they  are  finished.  The 
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returned  number  must  be  read  out  by  the  host  computer 
before  any  further  host  port  communication  can  be  issued. 

After  we  specify  the  functions  in  Table  2,  the 
algorithm  for  producing  m.ulti-component  signals  can  be 
presented  below: 

1.  Create  a  sine  table  in  the  DSP-16' s  data  buffer  RAM 
from  location  0  to  49999,  and  a  copy  of  the  sine  table  at 
addresses  from  50000  to  99999.  This  is  necessary  because 
of  the  design  of  the  TMS320  functions  described  above. 

2.  Clear  the  signal  buffer.  The  signal  buffer  starts 
at  the  address  of  100000.  The  length  of  the  signal  buffer 
depends  on  the  duration  of  the  stimulus  and  the  signal 
buffer  will  end  at  the  address  of  100000+length-l . 

3.  Generate  one  component  of  the  signal  according  to 
the  frequency  of  that  component  using  "GEN_COMPONENT" . 

4.  Modify  the  amplitude  of  the  component.  This  can  be 
done  by  multiplying  values  in  the  signal  buffer  by  a 
constant  using  the  "NORMALIZE"  function. 

5.  Add  the  component  generated  in  step  3  and  4  to  the 
signal  buffer. 

6.  Repeat  step  3  to  5  until  all  the  components  of  the 
signal  have  been  generated. 

The  above  algorithm  requires  sampling,  normalizing  and 
adding  each  of  the  components  in  turn,  therefore,  the 
signal  can  not  be  generated  in  real  time,  despite  the 
TKS32025  processor's  speed.  This  TMS320  prograui  is  still 
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much  faster  than  any  high  level  language  for  generating 
multi-tone  complexes.  For  example,  it  takes  1.71  seconds 
to  generate  a  125  ms  ten-component  signal  using  the 
TMS320  program.  But,  it  takes  6.21  seconds  to  generate 
the  same  signal  using  TURBO  PASCAL  on  our  386  machine 
with  Cyrix  CX-83D87  math  processor  (Cyrix  Corporation, 
1990)  .  If  there  were  no  math  co-processor,  the  time 
needed  to  generate  the  same  signal  would  extend  to  33.23 
seconds.  For  this  comparison,  we  used  the  same  algorithm 
in  the  TMS320  program  and  the  TURBO  PASCAL  program.  The 
time  consumption  of  disk  input/output  and  loading  the 
sine  table  are  not  included. 

All  of  the  functions  discussed  in  Table  2  have  been 
incorporated  into  a  TMS32025  program  called  MULTI. ASM. 
The  RESMON  monitor  occupies  the  first  IK  byte  as  usual 
and  is  followed  by  seven  user  functions.  MULTI. ASM  has 
been  used  in  our  lab  to  generate  multi-component  signals 
for  "Profile  Analysis"  (Whitelaw  et  ai.,  1991)  and 
"Complex  Sound  Discrimination"  experiments. 

Click  signals 

It  is  very  convenient  to  use  the  DSP-16  to  generate 
pulse-like  signal.  A  pulse  can  be  generated  by  sending  a 
sequence  of  number,  which  contains  only  0  and  32767  (the 
maximum  16-bit  positive  number),  to  the  DAC . 

A  TMS32025  program,  CLICKS. ASM,  has  been  developed 
in  our  lab  to  generate  dual-channel  click  sequences.  For 
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the  requirement  of  our  experiment,  the  signal  output  of 
the  program  contains  a  dual-channel  click  train  followed 
by  a  silent  delay  and  a  test  click  signal.  A  typical 
signal  output  and  its  parameters  are  shown  in  Figure  5. 
This  signal  is  very  similar  to  the  stimuli  used  in 
Freyman' s  study  of  the  precedence  effect  (Freyman  et  al., 
1991)  . 


Insert  Figure  5  about  here 


Like  the  other  programs,  CLICKS. ASM  consists  of  two 
parts.  The  first  part  is  the  RESMON  monitor,  which 
occupies  the  first  IK  byte  of  the  program  memory,  and  the 
second  part  is  the  user  functions.  In  CLICKS. ASM,  only 
one  user  function  is  declared:  the  user  function  #0, 
which  will  generate  the  signal  output.  User  function  #0 
can  be  called  by  several  high  level  programming  language. 
Ten  parameters  should  be  sent  to  the  function.  The 
procedure  for  calling  the  function  is  described  as 
follows . 

First,  call  the  RESMON  command  {CMD12)  to  activate 
the  user  function.  Because  ten  parameters  need  to  be  sent 
to  DSP-16,  the  code  format  issued  for  CMD12  is  B280h. 
Following  CMD12,  ten  parameters  are  sent  to  DSP-16.  These 
parameters  and  their  definitions  are  shown  in  Table  3. 
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Insert  Table  3  about  here 


Note  that  the  time  measurement  in  the  table  is  equal  to 
50  times  the  duration  in  milli-second.  This  is  because 
the  sampling  rate  is  set  to  50  KHz  in  CLICK6.ASM. 

A  TURBO  PASCAL  example  for  calling  this  user 
function  is  given  in  Figure  6.  In  this  example,  we  assume 
that  all  the  variables  and  procedures  have  already  been 
declared . 


Insert  Figure  6  about  here 


Several  TURBO  PASCAL  programs  using  CLICKS. ASM  have 
been  implemented  in  our  lab  to  control  the  experiments  in 
the  study  of  "precedence  effect"  (Wallach  et  al.,  1949). 
There  are  several  advantages  of  using  CLICKS. ASM  instead 
of  using  pre-recorded  stimuli.  First,  the  user  can 
control  all  the  parameters  of  the  signal  very  easily  in 
high  level  programming  language  to  generate  different 
stimuli.  Second,  since  the  parameters  can  be  modified  in 
the  program,  it  is  possible  to  design  adaptive  procedures 
and  roving  level  techniques  in  experiments.  Third,  the 
silent  delay  is  controlled  by  the  TMS32025  processor,  so 
we  can  specify  a  very  precise  duration  for  the  silent 
delay.  For  example,  if  the  sampling  rate  is  50  KHz,  the 
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duration  can  be  precise  to  0.02  milli-second  (the 
reciprocal  of  the  sampling  frequency) .  Finally,  no  disk 
space  is  necessary  for  saving  different  stimuli,  because 
the  stimuli  are  generated  in  real  time  by  DSP-16.  All  of 
the  programs  have  been  tested  on  IBM  PC/AT  and  80386 
c'^mpatible  machines. 

System  requirements 

All  Of  the  TMS320  programs  described  in  this  report  are 
designed  for  use  on  the  Ariel  DSP-16  Data  Acquisition 
system  (Ariel  corporation,  1987)  equipped  with  at  least 
256K  words  of  data  buffer  RAM.  The  first  IK  bytes  of  the 
programs  is  always  the  RESMON  monitor  provided  by  the 
Ariel  company.  The  user  functions  are  assembled  starting 
at  the  address  1024.  All  the  TURBO  PASCAL  examples  shown 
in  this  report  are  tested  on  IBM  PC/AT  and  386  compatible 
machines.  The  TMS320  programs  can  also  interface  to  some 
other  high  level  languages  such  as  Microsoft  C  and  BASIC 
(Ariel  corporation,  1987)  . 

Availability  of  progrcuns 

The  following  listing  are  available  from  the  authors  upon 
request:  the  source  code  and  assembled  machine  code  of 
the  TMS320  programs  described  in  this  report,  TURBO 
PASCAL  program  example  of  initializing  DSP-16  and  loading 
programs  to  DSP-16,  and  detailed  TURBO  PASCAL  examples  of 
using  these  TMS320  programs.  The  first  author  may  be 
contacted  via  E-mail  at  hsuc@shs.ohio-state.edu. 
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Table  1 

Parameters  Used  for  User  Function  #0  in  GENTONE.ASM 


Parameter 

Description 

Start  frequency 

(Hz) 

The  start  frequency  of  the  signal. 

Step  height  (Hz) 

The  frequency  distance  between  two 

steps  of  the  signal.  Note  that  if 

a  steady  tone  is  wanted,  step 

height  is  equal  to  zero.  The  step 

height  is  equal  to  one  for  glide 

tones . 

Step  number 

The  number  of  steps  in  a  signal. 

Note  that  a  steady  tone  has  a  step 

number  of  one  and  the  step  number 

of  a  glide  tone  is  equal  to  the 

frequency  excursion  plus  one. 

Long  step  length 

The  hiqh  16  bits  of  the  number  of 

(high) 

samples  of  the  long  steps.  The 

middle  steps  of  a  signal  are 

assigned  to  long  steps  (see  Figure 

1)  . 

(table  continues) 
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Table  2 

Functions  used  for  generating  multi-component  complexes. 
Note  that  all  the  addresses  are  32-bit  long-inteaer 
consisting  of  high  16  bits  and  low  16  bits.  Thus  an 
address  variable  is  defined  by  two  16  bit  words. 


Function 

Description 

DAC 

Digital  to  analog  conversion 

function.  Output  signals  to  both 

channels.  Code  format  =  B200h. 

Eight  parameters  :  start  address 

and  stop  address  of  channel  #1,  and 

start  address  and  stop  address  of 

channel  #2.  Return  an  FFFFh  value 

to  PC  when  function  is  done. 

FIND_PEAK 

Find  the  peak  value  for  a  data 

segment.  Code  format  =  BlOlh.  Four 

parameters  :  Start  address  and  stop 

address  of  the  data.  Return  the 

peak  value  and  a  32-bit  peak 

address  (low  and  high) .  Note  that 

this  function  must  be  executed 

before  calling  NORMALIZE. 

(table  continues) 
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Function 
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Description 


Reset  the  signal  buffer  to  zero. 


Code  format  =  B103h,  Four 


parameters  ;  start  and  stop 


addresses  of  the  buffer.  Return  an 


FFFFh  value  when  function  is  done, 


Generate  a  single  frequency 
component.  Code  format  =  B144h. 

Five  parameters  ;  start  and  stop 
addresses  of  the  signal  buffer  and 
the  frequency  of  the  component.  The 
method  used  to  generate  signals  in 
this  function  is  very  similar  to 
that  used  in  generating  pure  tone, 
which  is  described  in  the  previous 
section.  Return  ar.  FFFFh  value  when 


function  is  done. 
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B246h.  Nine  parameters  :  the  start 
and  stop  addresses  of  signall,  the 
start  and  stop  addresses  of 
signal2,  and  a  flag.  The  flag  is 
equal  to  0  for  plus,  1  for  minus. 
Return  an  FFFFh  value  when  function 


is  done. 


Parameter 


Definition 


pattern 


repeat  time 


segment 


lagl  (50*ms) 


Define  the  pattern  for  one  cycle  of 
the  two-channel  pulse  sequences.  1  for 
high  output,  0  for  low  output.  For  the 
signal  described  in  figure  4,  the 
value  of  pattern  is  33,  which  is 
00100001  in  binary. 

Define  the  number  of  cycles  of  the 
condition  pulse,  when  there  is  a 
condition  pulse.  This  value  is  1,  when 
there  is  no  condition  pulse. 

The  number  of  segments  in  one  cycle  of 
the  pulse  sequence.  For  the  signal  in 
figure  4,  the  value  of  segment  is  4. 

Delay  of  the  second  pulse  of  the  test 
signal.  Note  that  the  value  sent  to 
the  DSP-16  is  (lagl  -  pulse_width) . 


(table  continues) 
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Parameter 

Definition 

ici  (50*ms) 

Inter  click  interval  is  the  period  of 

the  pulse  train.  Note  that  the  value 

sent  to  the  DSP-16  is  (ici  -  lagl  - 

pulse_width) . 

pulse_width 

(50*ms) 

The  width  of  the  click. 

pi 

Send  1  if  B  channel  is  active  for  test 

signal.  Send  0  if  B  channel  is 

inactive  for  test  signal. 

silent_clelay 

Delay  between  the  condition  pulse 

(50*ms) 

train  and  the  test  signal.  Note  that 

the  value  sent  to  DSP-16  is 

(silent  delay  -  ici)/2,  because  the 

TMS32025  program  is  designed  in  this 

way . 

P2 

_ 1 

Send  1  if  condition  pulse  train  is 

activated.  Send  0  if  there  is  no 

condition  pulse  train. 

(table  continues) 
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Figure  Caption 

Figure  1.  An  example  of  the  three  types  of  tones 
generated  by  GENTONE.ASM.  This  figure  shows  a  400  Hz 
steady  tone,  a  7-step  step  tone  having  a  step  heigth  of 
40  Hz,  and  a  glide  tone  from  900  to  1140  Hz. 

Figure  2.  (a) A  sine  waveform,  (b)A  cosine  squared  gating 

envelope,  (c)The  modified  sine  waveform. 

Figure  3.  A  segment  of  TURBO  PASCAL  program  including 
examples  of  loading  sine  table  into  the  data  buffer  and 
calling  GENTONE.ASM. 

Figure  4.  Line  spectrum  for  a  multi-component  signal. 
Figure  5.  A  typical  signal  generated  by  CLICK6.ASM.  The 
parameters  associated  with  the  signal  are  also  shown. 
Figure  6.  A  TURBO  PASCAL  example  for  calling  CLICK6.ASM. 
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To  date,  a  limited  number  of  studies  have  used  a  dichotic 
stimulus  configuration  presentation  in  profile  analysis 
experiments.  This  study  used  a  dichotic  stimulus 

configuration  for  profile  analysis  which  had  not  been 
previously  used.  Profile  stimuli  were  presented  both 
diotically  and  dichotically .  The  diotic  signal  consisted  of 
twenty-one  components  with  equal  logarithmic  spacing  presented 
to  both  ears.  The  profile  signal  was  created  by  incrementing 
the  odd  numbered  components  of  the  signal  by  n  dB.  The 
dichotic  signal  was  created  by  presenting  the  odd  numbered 
components  to  the  right  ear  and  the  even  numbered  components 
to  the  left  ear.  Two  experiments  were  performed.  The  first 
experiment  was  to  determine  if  profile  analysis  abilities 
differed  when  stimuli  were  presented  dichotically.  The  second 
experiment  was  performed  to  determine  if  subjects  might  be 
using  interaural  intensity  difference  cues  instead  of  a 
profile  analysis  strategy  during  the  dichotic  listening  task. 
Results  suggested  that  most  subjects  were  able  to  perform  the 
dichotic  profile  analysis  task,  with  significant  differences 
noted  between  diotic  and  dichotic  results.  Results  of  the 
second  experiment  suggest  that  profile  analysis  and  interaural 
intensity  differences  do  not  appear  to  result  form  the  same 
mechanism. 

PACS  Numbers:  43.66E;  43.66R 
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INTRODUCTION 


Studies  of  auditory  profile  analysis  have  demonstrated 
that  listeners  are  able  to  discriminate  very  small  intensity 
increments  added  to  a  multitonal  background  complex, 
supporting  the  concept  that  auditory  cues  remote  from  a 
target  signal  may  assist  in  discrimination  of  that  signal. 
During  the  past  decade,  a  number  of  stimulus  characteristics 
have  been  investigated  within  the  profile  analysis  paradigm. 
The  majority  of  these  studies  have  used  either  a  monotic  or 
diotic  stimulus  presentation.  To  date,  very  few  studies 
have  used  dichotic  stimulus  presentation. 

The  limited  number  of  studies  that  have  used  a  dichotic 
stimulus  have  generally  presented  a  single  center  component 
to  one  ear  and  twenty  background  components  to  the  other  ear 
simultaneously  (e.g.  Green  &  Kidd,  1983;  Bernstein  &  Green, 
1987)  (Figure  la).  The  results  of  these  studies  have 
suggested  that  listening  performance  was  inferior  for  a 
dichotic  listening  condition  when  compared  to  a  monotic 
listening  condition.  A  conflicting  result  was  presented  by 
Fantini,  Schooneveldt ,  &  Moore  (1989).  When  a  dichotic 
profile  configuration  with  the  target  component  was 
presented  to  one  ear  and  four  background,  or  "flanking," 
components  were  presented  to  the  other  ear  simultaneously 
(Figure  lb),  they  reported  that  the  dichotic  presentation  of 
the  background  components  did  not  significantly  affect 
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profile  analysis  results  when  compared  to  a  monotic 
presentation  mode.  Fantini  et  al.  suggest  that  listeners  were 
able  to  combine  information  across  ears  in  order  to  develop 
a  profile,  at  least  for  the  conditions  reported  in  their 
study. 

A  potential  problem  with  the  type  of  dichotic 
presentation  methods  previously  used  in  profile  analysis  tasks 
is  that  listeners  may  be  able  to  perform  the  task  from  the 
information  presented  to  a  single  ear.  The  listener  may  be 
able  to  use  information  presented  to  each  ear  separately  to 
make  a  correct  response,  without  needing  to  combine  the 
information  between  the  two  ears.  In  addition,  this  stimulus 
configuration  may  not  provide  a  fused  image  that  the  listener 
can  process  binaurally. 

A  type  of  dichotic  profile  has  been  developed  for  this 
study  in  which  the  stimulus  presented  to  either  ear  alone 
provides  no  profile  information  and  the  target  signal  can  only 
be  recognized  when  stimuli  are  combined  between  the  two  ears. 

The  development  of  this  type  of  stimulus  configuration  is 
based  on  cyclopean  perception  experiments  in  vision  proposed 
by  Julesz  (1971).  In  these  experiments,  different  information 
is  presented  to  each  eye  simultaneously,  and  the  target  will 
only  be  perceived  if  the  inputs  to  the  two  eyes  are  combined; 
if  the  target  is  perceived  when  a  stimulus  is  presented  to 
either  eye  individually,  Julesz  suggests  that  the  stimulus  is 
not  being  perceived  by  a  central  mechanism. 


A  limited  number  of  auditory  analogues  of  cyclopean 
perception  have  been  reported.  Huggins  and  Cramer  (1958) 
demonstrated  that  pitch  can  be  perceived  as  a  result  of 
dichotic  interaction  of  noise  stimuli  presented  to  both  ears 
simultaneously.  Houtsma  and  Goldstein  (1972)  illustrated 
that  a  mechanism  exists  for  combining  information  from  the  two 
ears  to  form  a  "central  spectrum".  The  findings  of  these 
studies  suggest  that  a  mechanism  exists  for  combining 
information  which  is  presented  between  the  two  ears. 
Experiments  such  as  these,  which  simulate  cyclopean  aspects 
of  the  auditory  system,  may  give  insight  into  the  central 
aspects  of  how  information  is  extracted  from  complex  auditory 
stimuli.  Julesz  (1971)  suggests  that  despite  the  difficulties 
in  developing  techniques  to  investigate  the  "cyclopean 
cochlea" ,  further  definition  of  the  location  of  a  cyclopean 
ear  is  necessary.  The  use  of  dichotic  profile  stimuli  is  of 
interest  in  order  to  understand  the  central  versus  peripheral 
nature  of  the  profile  analysis  taslc. 

The  present  study  was  designed  to  use  a  cyclopean  type 
of  stimulus  to  explore  the  auditory  system.  A  type  of 
dichotic  stimulus  was  used  for  the  profile  analysis  task  which 
had  not  been  previously  used  in  other  dichotic  profile 
analysis  studies.  In  this  study,  the  profile  stimulus  was 
split  between  the  two  ears  in  a  manner  in  which  the  profile 
cannot  be  perceived  by  the  listener  when  the  information  was 
presented  to  either  ear  alone  (Figure  Ic).  The  perception  of 


the  profile  occurred  only  when  the  stimuli  were  combined 
between  the  two  ears.  It  was  postulated  that  splitting  the 
component  frequencies  between  the  two  ears  may  allow  the 
listener  to  develop  a  better  fused  image  to  be  processed 
binaurally  than  in  the  stimuli  previously  used  in  profile 
analysis  experiments  using  dichotic  stimuli.  This  is 
supported  by  Dye  (1990),  who  suggested  that  listeners 
perceived  a  single,  fused  lateralized  image  even  when  multiple 
components  of  complex  signals,  greater  than  one  critical  band 
apart,  were  presented  dichotically .  In  addition,  is  was  of 
interest  to  use  a  dichotic  profile  stimuli  in  order  to  further 
understand  the  possible  central  nature  of  the  profile  analysis 
task. 

The  major  purpose  of  this  study  was  to  determine  if 
profile  analysis  abilities  differ  when  stimuli  are  presented 
dichotically. 

EXPERIMENT  1;  Dichotic  Profile  Analysis 
I.  METHODS 
A.  Subjects 

Four  young  adult  females,  ranging  in  age  from  19  to  32 
years,  participated  in  this  study.  All  subjects  had  normal 
hearing  acuity  and  normal  middle  ear  functioning  bilaterally. 
Three  of  the  four  listeners  were  college  students  paid  for 
their  participation  in  this  study;  the  fourth  listener  was  one 
of  the  authors.  One  of  the  subjects  had  prior  experience  as 
a  listener  in  psychoacoustic  tasks,  and  the  other  three 


subjects  participated  in  the  pilot  experiment  for  this  study 
(Whitelaw,  Hsu,  and  Feth,  1991).  Each  subject  had  at  least 
twenty  hours  of  practice  in  the  task  to  assure  that  stable 
performance  was  achieved  prior  to  initiation  of  data 
collection. 

B.  Stimuli 

Signals  were  digitally  generated  on-line  using  an  Ariel 
DSP-16  signal  processing  subsystem  housed  in  a  Zenith  Z159 
microcomputer.  These  signals  were  converted  into  analog 
waveforms  for  each  listening  interval  using  a  two  channel  16- 
bit  digital-to-analog  converter  at  an  overall  sampling  rate 
of  50,000  Hz  per  channel  (Hsu  and  Feth,  1992). 

Profile  stimuli  were  presented  both  diotically  and 
dichotically.  The  diotic  signal  consisted  of  twenty-one 
components  with  equal  logarithmic  spacing  (Figure  Ic).  The 
mean  level  of  the  signal  was  50  dB  SPL  per  component.  Level 
per  component  was  equal  in  the  standard  signal.  To  create  the 
profile,  the  odd  numbered  components  were  incremented  by  n  dB, 
where  n=.5,  1,  2,  3,  4,  6,  or  7  dB.  All  twenty-one  components 
were  presented  to  both  ears  simultaneously.  The  dichotic 
signal  was  created  by  presenting  the  odd  numbered  components 
to  the  right  ear  and  the  even  numbered  components  to  the  left 
ear.  Stimuli  were  presented  with  a  roving  level  to  limit  the 
listeners  ability  to  base  discrimination  decisions  on  simple 
stimulus  level  changes  (Berliner  and  Durlach,  1973).  A  roving 
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range  of  20  dB  (+/-  10  dB),  varying  in  1  dB  steps  was 

employed. 

Three  center  frequencies — 500  Hz,  1000  Hz,  and 
2000  Hz--were  employed  for  both  diotic  and  dichotic 
stimulus  presentation.  Equal  logarithmic  spacing, 

corresponding  to  a  frequency  ratio  of  1.175,  was  used  for  the 
three  center  frequencies.  Each  signal  was  presented  for  200 
ms,  with  an  inter-trial  interval  of  approximately  500  ms. 

Listeners  were  tested  in  separate  single-walled  sound- 
isolated  rooms.  A  group  of  three  listeners  were  able  to 
listen  at  the  same  time.  Each  listener  was  seated  facing  a 
monitor  and  computer  keyboard  (Radio  Shack  Color  Computer 
II) . 

Signals  were  presented  binaurally  via  Sennheiser  HD414SL 
headphones.  Listening  intervals  were  indicated  with  visual 
cues  and  visual  feedback  displaying  the  correct  response  was 
provided  immediately  following  each  trial.  All  subjects 
listening  at  the  same  time  heard  the  same  signals  and  their 
responses  were  recorded  separately.  Recording  of  responses 
and  visual  feedback  was  controlled  by  the  Zenith  Z-159 
microcomputer . 

Data  collection  was  carried  out  in  blocks  of  50  trials, 
with  listeners  taking  a  break  after  every  three  blocks.  The 
intensity  increment  step  was  held  constant  over  each  group  of 
three  blocks.  Short  breaks  were  provided  when  three  blocks 
were  completed.  Longer  breaks  were  given  after  six  blocks 
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were  completed.  Data  from  approximately  900  trials,  or  6 
groups  of  three  blocks,  were  obtained  in  each  daily  session. 

A  2  cue-2  alternative  forced-choice  (2Q-2AFC)  procedure 
was  used  for  this  study.  For  the  profile  analysis  task,  a 
standard  profile,  consisting  of  twenty-one  components  with 
equal  level  per  component,  was  always  presented  in  intervals 
one  and  four.  The  signal  was  generated  by  adding  an  increment 
to  the  level  of  the  odd-numbered  components.  It  was  presented 
in  either  interval  two  or  three  with  equal  probability.  A 
roving  level  paradigm  was  employed,  with  a  roving  range  of  20 
dB  (+/-  10  dB)  varying  in  1  dB  steps.  In  this  paradigm,  the 
overall  intensity  level  for  each  trial  was  randomized, 
minimizing  the  listener's  use  of  intensity  anchors. 

The  listeners'  task  was  to  determine  which  of  the  two 
center  intervals  of  the  2Q-2AFC  paradigm  contained  the  target 
signal.  Listeners  indicated  their  responses  by  pressing  the 
appropriate  key  on  the  microcomputer  keyboard. 

To  assure  that  listeners  required  stimulus  input  from 
both  ears  in  order  to  discriminate  a  "profile",  the  eleven 
component  signal  with  the  intensity  increment  added  was 
presented  to  the  right  ear  only  for  a  separate  block  of 
trials.  A  6  dB  increment  was  added,  since  in  the  dichotic 
case,  a  majority  of  the  listeners  could  easily  discriminate 
a  6  dB  intensity  increment.  In  this  task,  all  subjects' 
performance  fell  to  chance  over  at  least  300  trials. 
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III.  RESULTS 


Percent  correct  discrimination  was  plotted  as  a  function 
of  intensity  increment  for  each  of  the  three  center 
frequencies.  Each  percent  correct  discrimination  score  was  the 
average  of  300  trials  (6  blocks  of  50  trials).  All  six  blocks 
fell  within  one  standard  error  of  the  mean  of  the  proportion 
for  a  binomial  variable  (Krajewski  and  Ritzman,  1990).  This 
required  that  the  percent  correct  score  for  all  six  blocks  be 
within  +/-  8%  of  the  mean  percent  correct  score  for  the  six 
blocks.  If  all  six  blocks  did  not  meet  this  criterion, 
another  six  blocks  were  collected  and  the  process  was 
repeated. 

To  facilitate  fitting  a  line  to  the  data,  percent  correct 
discrimination  scores  were  converted  to  d'  values  for  a  2AFC 
task  (McNichol,  1972).  d'  values  were  plotted  as  a  function 
of  intensity  increment  for  each  center  frequency.  A  simple 
regression  line  was  fit  to  each  psychometric  function. 
Psychometric  functions  at  each  frequency  for  individual 
listeners  are  presented  in  Figures  2a-d.  Discrimination 
thresholds  were  derived  from  the  linear  regression  formula 
which  was  used  to  determine  the  intensity  level  corresponding 
to  a  d'  of  .95,  or  75%  correct  value.  Discrimination 
thresholds  for  individual  subjects  are  presented  in  Table  1. 

Several  points  are  noted  from  inspection  of  the  profile 
results  for  individual  subjects.  For  the  diotic  listening 
conditions,  all  subjects  were  able  to  achieve  threshold  for 
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all  frequency  conditions.  However,  in  the  dichotic  listening 
condition,  only  three  of  the  four  subjects  achieved  75% 
performance.  The  fourth  subject  was  unable  to  reach  75% 
performance  on  the  dichotic  condition  for  any  of  the  center 
frequencies.  This  observation  is  supported  by  findings  in 
previous  experiments  in  profile  analysis,  in  which 
considerable  variability  for  individual  performance  on  this 
task  has  been  noted  (Green,  1988b). 

Visual  inspection  of  the  psychometric  functions  reveals 
that  the  dichotic  results  are  shifted  to  the  right  relative 
to  the  diotic  results  for  all  frequency  conditions  and  for  all 
listeners.  Diotic  performance  was  compared  to  dichotic 
performance  for  each  center  frequency  for  the  listeners  who 
were  able  to  perform  both  the  diotic  and  dichotic  task. 

This  comparison  was  accomplished  by  using  a  t-test  designed 
to  determine  the  significance  of  differences  between 
independent  slopes  (Cohen  and  Cohen,  1983).  The  results  of 
this  t-test  demonstrated  that  the  slopes  of  the  lines  were 
significantly  different  at  500  Hz  and  1000  Hz  for  all  three 
subjects.  The  slopes  at  2000  Hz  were  significantly 
different  for  one  of  the  subjects  (S2),  while  a  significant 
difference  was  not  observed  for  the  other  two  subjects  (SI 
and  S3 ) . 

d'  values  were  averaged  across  the  three  subjects  who 
were  able  to  achieve  threshold  for  both  the  diotic  and 
dichotic  listening  task.  Although  the  results  of  the  fourth 


subject  are  not  included  in  this  analysis,  they  will  be 
included  in  the  general  discussion  of  results. 

Psychometric  functions  averaged  across  the  subjects  for 
each  center  frequency  are  shown  in  Figures  3a-c.  Simple 
regression  lines  were  fit  to  these  data  and  are  also  shown  in 
these  figures.  Diotic  discrimination  thresholds  ranged  from 
2.0  to  2.2  dB  and  dichotic  discrimination  thresholds  ranged 
from  4.5  to  4.8  dB  across  the  test  frequencies. 

A  three  way  analysis  of  variance  (ANOVA)  was  used  to 
determine  effects  of  stimulus  presentation  (diotic  vs. 
dichotic)  and  effects  of  center  frequency.  This  statistical 
analysis  confirms  the  conclusions  obtained  through  visual 
inspection  of  the  data.  That  is,  significant  differences  were 
observed  between  the  diotic  and  dichotic  stimulus  conditions 
(F=9.21,  p>.01)  and  between  listeners  (F=69.02,  p>.001).  No 
significant  differences  were  noted  among  the  three  frequency 
conditions.  In  addition,  significant  listener-by-stimulus  or 
listener-by-frequency  interactions  were  not  found. 

In  discussing  preliminary  results  of  this  study, 
Bernstein  (1990)  suggested  that  subjects  might  be  using 
interaural  intensity  difference  cues  instead  of  a  profile 
analysis  strategy  during  the  dichotic  listening  tasJc.  To  test 
this  premise,  a  second  experiment  was  performed. 

EXPERIMENT  2;  INTERAURAL  INTENSITY  DIFFERENCES 

The  ability  of  the  auditory  system  to  detect  very  small 
differences  in  intensity  for  stimuli  presented  to  the  two  ears 
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is  well  documented.  Nuetzel  (1982)  reported  that  when 
interaural  intensity  difference  (IID)  threshold  were  measured 
for  pure  tones,  listeners  were  sensitive  to  small  differences 
in  intensity  even  when  frequencies  of  the  two  tones  were 
separated  by  more  than  half  an  octave.  He  noted  a  frequency 
effect  for  IID  performance,  since  IID  thresholds  were  found 
to  be  lower  for  higher  frequency  stimuli.  Based  on  Nuetzel 's 
research,  Bernstein  (1990)  has  suggested  that  when  dichotic 
profile  analysis  stimuli  are  presented  to  listeners,  they  may 
be  able  to  use  a  binaural  intensity  summation  cue  and  focus 
on  an  interaural  intensity  difference  between  the  center 
frequency  and  adjacent  component  instead  of  performing  an 
analysis  of  spectral  shape. 

I.  METHODS 

A.  Subjects 

The  same  four  subjects  that  participated  in  Experiment 
I  particirated  in  this  experiment. 

B.  Stimuli 

Pure  tone  stimuli  for  binaural  interaural  intensity 
difference  thresholds  were  digitally  generated  on-line,  using 
the  dichotic  profile  signal  program  modified  to  generate  one 
component  per  ear.  Subjects  were  trained  on  the  interaural 
intensity  difference  tas)c  with  pure  tones  presented  at  three 
center  frequencies — 500,  1000,  and  2000  Hz,  however  because 
three  of  the  four  subjects  were  unable  to  perform  the  taslc, 
only  2000  Hz  was  used  in  this  experiment.  Each  signal  was 
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presented  for  200  ms,  with  an  inter-trial  interval  of 
approximately  500  ms. 

C .  Procedure 

The  experimental  set-up  for  the  interaural  intensity 
difference  (IID)  experiment  was  similar  to  that  used  for  the 
profile  analysis  experiment.  Instrumentation  for  the 
interaural  intensity  difference  stimuli  is  the  same  as  that 
used  for  dichotic  profile  analysis. 

For  the  IID  task,  a  single  frequency  component  was  always 
presented  to  each  ear  in  intervals  one  and  four  of  the  2Q-2AFC 
task.  An  intensity  increment,  added  to  the  component  directed 
to  the  right  ear  (Channel  1),  was  presented  in  either  interval 
two  or  three.  This  procedure  is  similar  to  the  procedure  used 
for  the  IID  experiment  by  Nuetzel  (1982,  1991). 

Data  collection  was  initiated  for  the  interaural 
intensity  difference  task  after  the  listeners  had  many  months 
experience  with  the  profile  task  for  stimuli  of  both  500  ms 
in  duration  (pilot  study)  and  200  ms  in  duration. 

Three  conditions  were  tested  for  the  IID  task;  a  fixed 
condition  with  a  2000  Hz  signal  presented  bilaterally,  a 
roving  level  condition  with  a  2000  Hz  signal  presented 
bilaterally;  and  a  roving  level  condition  a  2000  Hz  signal 
presented  to  the  right  ear  and  a  2350  Hz  signal  presented  to 
the  left  ear.  A  6  dB  intensity  increment  was  added  to  the 
right  ear  stimulus  in  all  conditions.  This  intensity  increment 
was  selected  since  in  the  pilot  dichotic  profile  analysis 
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studies,  subjects  capable  of  performing  the  profile  analysis 
task  with  dichotic  stimulus  presentation  were  easily  able  to 
discriminate  a  6  dB  increment  (Whitelaw,  Hsu,  and  Feth,  1991). 
III.  RESULTS 

Interaural  intensity  difference  results  presented  here 
were  obtained  after  subjects  had  practiced  an  average  of  3500 
trials.  Gradual  improvement  in  listener  performance  was  not 
observed  with  practice  for  three  of  the  four  subjects  (SI,  S2, 
and  S3 ) . 

Interaural  intensity  discrimination  results  are  presented 
for  individual  subjects  in  Table  2.  Inspection  of  these 
results  demonstrates  that  two  of  the  subjects  (SI  and  S4)  were 
able  to  achieve  threshold  for  the  fixed  listening  condition, 
while  the  other  two  subjects  (S2  and  S3)  did  not.  Of  the  two 
subjects  able  to  perform  the  task  in  the  fixed  level 
condition,  only  one  of  them  (S4)  continued  to  demonstrate  good 
performance  in  the  roving  level  conditions  while  the  other 
subject's  '  performance  decreased  to  chance.  It  is  of 
interest  to  note  that  the  subject  who  demonstrated  the 
superior  performance  in  the  interaural  intensity  difference 
task  (S4)  is  the  subject  who  was  unable  to  achieve  threshold 
performance  in  the  dichotic  profile  analysis  task. 

Overall,  three  of  the  four  subjects  who  participated  in 
this  study  were  more  sensitive  to  diotic  profile  presentations 
than  for  equivalent  dichotic  presentations.  The  fourth 
subject  was  unable  to  perform  the  dichotic  profile  task,  but 
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was  able  to  perform  the  task  in  the  diotic  condition. 
Statistical  analysis  confirmed  that  the  diotic-dichotic 
differences  were  significant,  however  center  frequency  was 
not.  Only  one  of  the  four  subjects  was  capable  of  performing 
the  IID  task  at  a  level  substantially  above  chance,  and  that 
subject  was  the  only  subject  who  could  not  perform  the 
dichotic  profile  task. 

IV. GENERAL  DISCUSSION 

The  results  obtained  in  this  study  demonstrate  diotic 
versus  dichotic  differences  similar  to  those  reported  by 
Fantini  et  al.  (1989)  for  a  monotic-dichotic  comparison.  In 
their  study,  which  used  only  a  2000  Hz  center  frequency,  a  3 
dB  difference  was  reported  between  monotic  and  dichotic 
results,  with  the  thresholds  found  to  be  12  dB  and  9  dB  lower 
than  the  reference  condition  for  the  monotic  and  dichotic 
conditions,  respectively.  The  2.3  dB  difference  observed  in 
the  present  study  between  diotic  and  dichotic  thresholds  at 
2000  Hz  is  consistent  with  Fantini  et  al.,  despite 
methodological  differences  between  these  studies. 

These  results  would  appear  to  be  substantially  different 
from  results  reported  by  Green  and  Kidd  (1983)  and  Bernstein 
and  Green  (1987).  However,  the  differences  observed  may  be 
related  to  disparity  in  reporting  of  the  results.  Many 
studies  on  profile  analysis  have  expressed  threshold  results 
as  the  signal  level  relative  to  the  single  component  level  in 
the  stimulus  profile,  including  those  by  Green  and  Kidd 
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(1983)  and  Bernstein  and  Green  (1987).  In  order  to  compare 
their  results  directly  with  the  results  of  these  two  earlier 
studies,  Fantini  et  al.  (1989)  converted  the  signal  thresholds 
to  I  +  delta  I  values,  then  calculated  masking  release 
thresholds.  Even  when  these  converted  results  are  considered, 
the  dichotic  release  from  masking  continues  to  be  less  than 
that  observed  in  the  monotic  condition  in  both  studies.  For 
Green  and  Kidd  (1983),  the  difference  between  monotic  and 
dichotic  thresholds  is  6  dB,  which  Fantini  et  al.  contend  may 
be  comparable  to  those  obtained  in  their  study. 

When  Bernstein  and  Green's  (1987)  results  were  converted 
to  masking  release  thresholds,  a  minimal  masking 
release  (2.7  dB)  was  noted  for  the  dichotic  listening 
condition,  with  a  7  dB  difference  noted  between  monotic  and 
dichotic  thresholds.  The  overall  stimulus  level  used  in  their 
study  was  higher  than  that  used  by  Green  and  Kidd  (1983), 
Fantini  et  al.  (1989),  or  the  present  study,  which  may  have 
influenced  the  thresholds  obtained. 

When  discussing  profile  analysis  thresholds  obtained  for 
individual  listeners,  the  "level  detection  limit"  for  the 
profile  task  must  be  considered.  Green  (1988)  noted  that  the 
level  limit  might  be  the  size  of  the  intensity  increment  at 
which  the  listener  is  making  a  discrimination  decision  based 
on  absolute  intensity  differences  rather  than  spectral  shape. 
The  level  limit  depends  on  the  rove  range  over  which  the 
stimulus  is  varied.  Green  (1988)  has  offered  the  value  of 
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.2346  times  the  width  of  the  roving  range  as  the  level 
detection  limit  for  a  2  AFC  task.  In  the  present  study,  the 
level  limit  is  approximately  5  dB.  Several  of  the  dichotic 
thresholds  obtained  in  the  present  experiment  (6  dB  at  500  Hz 
for  S3  and  7  dB  r  :  1000  Hz  for  S2)  exceed  the  level  detection 
limit.  This  might  suggest  that  under  these  conditions, 
subjects  were  basing  their  detection  decision  on  a  change  in 
intensity  level  within  one  critical  band  and  not  on 
differences  in  spectral  shape. 

There  is  evidence,  however,  to  suggest  that  subjects  were 
not  merely  using  absolute  level  differences  in  order  to  detect 
the  addition  of  the  target  increment.  Inspection  of  the 
psychometric  functions  obtained  from  subjects  in  this  study 
show  them  to  be  smooth,  which  might  be  interpreted  to  indicate 
that  listeners  are  using  a  similar  listening  strategy  for  all 
intensity  increment  steps.  If  a  discontinuity  were  noted  in 
the  psychometric  function,  with  a  considerable  improvement 
noted  in  the  region  of  the  threshold,  a  change  in  listening 
strategy  might  be  suspected. 

The  differences  between  diotic  and  dichotic  performance 
on  the  profile  analysis  task  in  this  study  may  suggest  a 
different  mechanism  or  cue  available  to  the  listener  in  the 
dichotic  task.  One  model  proposed  to  explain  the  profile 
analysis  phenomena  is  the  channel  theory  (Durlach,  Braida, 
and  Ito,  1986;  Green,  1988).  The  channel  theory  postulates 
that  the  signal  is  analyzed  by  the  ear  with  a  bank  of 
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independent  filters.  The  output  of  each  filter  is  transformed 
so  that  the  pressure  level  obtained  from  each  channel  is 
estimated,  based  on  a  nonlinear  transformation  process  and 
temporal  integration.  The  estimate  is  biased  by  random, 
internal  noise.  This  model  asserts  that  the  listener  must 
determine  if  the  output  of  a  particular  channel  "stands  out" 
because  of  the  addition  of  a  more  intense  signal  or  because 
of  this  internal  noise.  Since  the  filters  are  assumed  to  be 
independent  of  each  other,  performance  should  be  unaffected 
by  moving  some  of  the  signal  components  to  the  other  ear.  That 
is,  the  same  performance  would  be  anticipated  for  both  diotic 
and  dichotic  listening  conditions.  This  does  not  appear  to 
be  the  case  in  the  present  study. 

Comparing  the  slopes  of  the  psychometric  functions  of 
dichotic  to  diotic  results  may  also  provide  additional  support 
for  a  different  mechanism  for  dichotic  profile  analysis.  If 
the  same  mechanism  were  responsible  for  both  diotic  and 
dichotic  profile  analysis,  the  slope  of  the  function  for  the 
dichotic  results  should  be  similar  to  that  of  the  diotic 
results,  merely  shifted  to  the  right.  This  hypothesis  is  based 
on  previous  research  which  suggested  that  diotic  thresholds 
were  lower  than  thresholds  obtained  from  dichotic  stimulus 
presentation.  From  inspection  of  the  psychometric  functions, 
the  dichotic  results  are  not  only  shifted  to  the  right  but  the 
slope  is  also  flatter  than  for  the  diotic  condition.  This 
observation  is  confirmed  by  the  statistical  tests  of 
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difference  for  the  slopes  of  the  lines  for  all  subjects  at  500 
and  1000  Hz  and  for  one  subject  at  2000  Hz.  This  lends  support 
to  the  notion  that  the  dichotic  cue  may  be  different  from  the 
cues  used  to  discriminate  the  diotic  profile. 

The  lack  of  a  significant  frequency  effect  might  support 
the  concept  that  subjects  are  using  something  other  than  an 
intensity  cue  for  extracting  the  profile.  A  wealth  of 
research  has  indicated  that  a  dual  mechanism  exists  for 
processing  binaural  information.  For  stimuli  below  1500  Hz, 
binaural  processing  appears  to  be  based  primarily  on 
interaural  time  cues.  For  stimuli  above  1500  Hz,  both 
interaural  time  and  interaural  intensity  cues  appear  to  be 
important  in  the  interpretation  of  the  signal  (Jeffress, 
1971).  Performance  on  the  IID  task  was  found  to  be  consistent 
with  this  notion.  IID  detection  was  found  to  improve  for 
tones  presented  at  2000  Hz  and  above  (Nuetzel,  1982).  It  was 
anticipated  that  if  dichotic  profile  analysis  were  based 
merely  on  an  interaural  intensity  difference,  lower 
discrimination  thresholds  would  be  observed  for  higher  center 
frequencies.  This  result  was  not  observed  for  the  present 
study. 

The  results  of  the  subject  (S4)  who  could  not  achieve 
threshold  in  the  dichotic  condition  require  further  comment. 
In  many  early  studies  in  profile  analysis,  listeners  who 
failed  to  achieve  threshold  after  a  number  of  trials  were 
excluded  from  the  experiment  (Henn  and  Turner,  1990).  This 
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subject  was  included  in  the  present  study  to  support  the 
notion  that  profile  analysis  may  not  be  a  "universal  effect" 
(Henn  and  Turner,  1990).  This  subject  may  be  using  a 
different  strategy  for  listening  for  spectral  shape 
information,  which  might  be  representative  of  a  strategy  used 
by  a  subgroup  of  listeners  for  these  types  of  tasks. 

In  order  to  extract  the  profile  from  the  signal  used  in 
the  present  experiment,  it  appears  that  listeners  require 
information  be  presented  to  both  ears  simultaneously.  However, 
alternative  explanations  to  combination  of  dichotic 
information  in  order  to  achieve  an  integrated  spectral 
stimulus  must  be  considered.  Instead  of  combining  the 
information  from  the  two  ears  into  a  fused  signal,  listeners 
might  be  able  to  simultaneously  monitor  the  intensity  level 
in  each  ear  and  compare  the  absolute  intensity  levels  between 
the  two  ears,  rather  than  making  the  decision  based  on 
spectral  shape  (Mason,  1991). 

Independent  intensity  level  comparisons  are  the  basis 
for  the  interaural  intensity  difference  (IID)  experiment.  It 
has  been  postulated  that  listeners  can  use  an  intensity 
comparison  between  the  two  ears  to  detect  very  small  intensity 
increments,  particularly  with  tonal  stimuli  in  the  2000  to 
3000  Hz  region.  Subjects  in  this  study  were  tested  first  for 
the  conditions  which  Nuetzel  (1982)  reported  maximum 
sensitivity  for  to  interaural  intensity  differences. 
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It  is  somewhat  surprising  that  the  three  subjects  who 
were  able  to  perform  the  dichotic  profile  task  were  unable  to 
achieve  greater  than  chance  performance  for  a  roving  level  IID 
task  after  more  than  3000  trials.  Based  on  these  results,  it 
would  appear  that  listeners  may  not  be  using  an  interaural 
absolute  level  detection  strategy  for  the  profile  task.  If 
this  were  the  case,  the  same  subjects  should  have  easily 
demonstrated  comparable  performance  on  the  IID  task  to  that 
obtained  for  the  dichotic  profile  analysis  task. 

Only  one  subject  (S4)  demonstrated  stable  and 
consistently  good  performance  on  the  IID  task.  She  is  also 
the  only  subject  who  did  not  achieve  threshold  on  the 
dichotic  profile  analysis  task.  For  the  IID  task,  her 
performance  was  not  substantially  degraded  by  the  addition  of 
the  roving  level  condition  after  training  on  the  fixed  level 
condition;  in  the  roving  level  condition,  the  performance  for 
the  other  three  subjects  fell  to  chance.  S4  reported  that  she 
did  not  hear  a  'fused'  image  for  the  profile  task  and  that  she 
was  not  using  a  'movement'  cue  for  the  IID  task.  The  results 
obtained  on  this  subject,  along  with  her  reported  subjective 
impressions,  suggest  that  she  may  be  monitoring  intensity  at 
each  ear  individually,  then  comparing  the  waveforms  at  both 
ears.  Since  she  was  unable  to  achieve  threshold  for  dichotic 
profile  task,  it  would  appear  that  this  strategy  is 
ineffective  for  discriminating  spectral  shape. 
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The  findings  of  the  present  study  suggest  that  most 
subjects  are  able  to  extract  a  profile  from  the  dichotic 
signal  presentation,  although  the  effect  is  not  as  robust  as 
when  the  signal  is  presented  dichotically.  This  ability  to 
detect  spectral  shape  information  from  a  signal  presented  to 
both  ears  may  be  interpreted  as  evidence  to  support  a  central 
mechanism  for  profile  analysis  effects. 

A  parallel  between  the  dichotic  profile  and  other 
auditory  tasks  thought  to  be  centrally  mediated  may  exist. 
For  example,  a  'residue*  pitch  can  be  heard  when  there  is  no 
possibility  of  interaction  of  the  auditory  signal  at  the  level 
of  the  cochlea  (Houtsma  and  Goldstein,  1972;  Moore,  1988). 
Listeners  were  asked  to  recognize  melodies  corresponding  to 
missing  fundamentals.  Listeners  were  able  to  extract  the 
residue  pitch  in  order  to  identify  melodies  when  the  signal 
components  were  presented  dichotically,  however  monotic 
performance  was  superior  to  dichotic  performance  at  high 
intensities.  Therefore,  despite  the  fact  that  the  dichotic 
performance  is  inferior  to  monotic  performance,  the  effect 
persists . 

The  superior  performance  for  the  monotic  condition  over 
the  dichotic  condition  at  high  intensities  has  been  attributed 
to  the  presence  of  combination  tones  which  provide  additional 
harmonics  which  ser^e  to  strengthen  the  pitch  in  this 
condition  (Houtsma  and  Goldstein,  1972).  Combination  tones 
o  not  occur  for  sinusoids  presented  dichotically,  and  they 
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are  much  weaker  when  presented  monotically  at  intensity  levels 
near  threshold.  Therefore,  the  dichotic  performance  obtained 
for  the  residue  pitch  experiments  may  reflect  the  "true" 
performance  of  the  auditory  system,  and  the  superior  monotic 
performance  may  be  an  artifact  of  additional  cues  which  are 
available  due  to  non-linearities  of  the  cochlea. 

A  similar  argument  could  be  applied  to  the  profile 
analysis  results  obtained  in  the  present  study.  It  is 
possible  that  dichotic  results  better  represent  the  listeners 
actual  performance  on  the  task.  Monotic  and  diotic  results 
might  be  enhanced  by  a  frequency  modulation  (FM)  cue,  as 
suggested  by  Feth  and  Stover  (1987).  They  suggested  that 
interactions  among  the  components  of  the  profile  analysis 
stimuli  result  in  FM  artifacts  which  are  detected  by  the 
listener.  These  modulation  cues,  which  are  available  to  the 
listener  in  monotic  and  diotic  profile  analysis  tasks, 
artificially  enhance  listener  performance.  These  cues  are  not 
salient  in  the  dichotic  listening  situation,  thus  the  "poorer" 
performance . 

V.  SUMMARY 

The  major  findings  of  this  study  are  as  follows: 

1 )  The  results  of  these  experiments  suggest  that  most 
subjects  are  able  to  perform  the  dichotic  profile  analysis 
task.  This  observation,  in  addition  to  significant  differences 
in  slopes  of  the  psychometric  functions  between  the  diotic  and 
dichotic  results,  might  suggest  that  a  different  mechanism  is 
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available  for  processing  of  spectral  shape  information 
presented  dichotically. 

2)  A  significant  frequency  effect  was  not  noted  for  either 
diotic  or  dichotic  stimuli.  This  is  particularly  noteworthy 
in  the  dichotic  condition,  given  that  simple  binaural 
intensity  summation  tasks  demonstrate  a  significant  frequency 
effect,  with  lower  thresholds  observed  for  higher  frequency 
stimuli.  Since  a  frequency  effect  was  not  observed  for  the 
results  of  the  present  study,  a  simple  intensity  summation 
for  information  presented  to  both  ears  is  not  suspected  as 
the  underlying  explanation  for  the  dichotic  profile  analysis 
results . 

3)  Although  similarities  may  exist  between  dichotic  profile 
analysis  and  other  types  of  binaural  listening  tasks,  profile 
analysis  and  interaural  intensity  differences  do  not  appear 
to  result  from  the  same  mechanism.  Subjects  who  were  capable 
of  performing  the  dichotic  profile  task  were  unable  to  perform 
the  IID  task,  even  after  considerable  training.  Conversely, 
the  only  listener  in  the  present  study  who  easily  learned  the 
IID  task  under  the  maximum  performance  conditions  reported  by 
Nuetzel  (1982)  was  unable  to  perform  the  dichotic  profile 
analysis  task. 

4)  Significant  individual  differences  in  listener  performance 
were  noted  for  the  profile  analysis  task.  This  is  consistent 
with  the  results  obtained  in  previous  studies  on  complex 
auditory  processing  in  general  and  specifically  in  other 
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profile  analysis  studies. 

The  results  of  the  present  study  suggest  that  profile 
analysis  is  not  mediated  on  only  the  peripheral  level  and  that 
a  central  mechanism  for  extracting  information  from  the  signal 
may  exist.  These  findings  demonstrate  lower  dichotic 
thresholds  than  obtained  in  the  majority  of  previous  research 
in  dichotic  profile  analysis,  which  may  be  related,  in  part, 
to  methodological  differences  across  studies. 
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Figure  1:  Dichotic  stimulus  configurations  a)  Green  and  Kidd, 
1983;  Bernstein  and  Green,  1987  b)  Fantini,  Schooneveldt ,  and 
Moore,  1989  c)  Whitelaw,  Hsu,  Feth,  1991. 


Figures  2a-d:  Psychometric  functions  for  individual  subjects  for 
diotic  and  dichotic  profile  analysis  performance. 
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Figures  3a-c;  Psychometric  function  for  group  mean  performance  for 
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TABLE  1 


1000  Hz 

Diotic  Dichotic 


1.5 

2.5 
2.0 
4.0 


3.0 

7.0 

4.5 

N/A 


2000  Hz 

Diotic  Dichotic 


1.5 

3.0 

2.0 

4.0 


3.5 

5.5 

4.5 
N/A 


TABLE  2 


PERCENT  CORRECT  SCORES  FOR  INTERAURAL  INTENSITY  DIFFERENCES 


Fixed  condition 

(2000  Hz  presented 

to  both  ears. 

6  dB 

increment ) 

Subject 

Percent  correct 

1 

86.3 

2 

70.3 

3 

56.7 

4 

93.3 

Roving  condition 

(+/-10  dB)  (2000  Hz 

presented  to 

both  ears 

6  dB  increment) 

Subject 

Percent  correct 

1 

47.3 

2 

57.7 

3 

58.0 

4 

90.7 

Roving  condition  (+/-10  dB)  (2000  Hz  presented  to  right  ear 
and  2350  Hz  presented  to  left  ear,  6  dB  increment) 


Subject 


Percent  correct 


1 

2 

3 

4 


52.7 

61.0 

52.3 

84.3 
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Envelope  Weighted  Average  of 
Instantaneous  Frequency 

Signal,  s{t)  =  e(t)  cos[d{t)] 

e(t)  is  the  instantaneous  envelope  and 
0(t)  is  the  instantaneous  phase. 

1  d9(i) 

Instantaneous  frequency,  f{t)  =- - ^ — 

2  TT  at 

EWAIF  of  signal  s(f),0  <t<T, 


EWs 


fo  e(t)  fit)  dt 
e(t)  dt 


Intensity  Weighted  Average  of 
Instantaneous  Frequency 

IWAIF  of  signal  s(t),0  <t<T, 


IWs 


Jo  f(t)  dt 
/J’  dt 


EWAIF  Envelope  is  the  weighting  function. 

IWAIF  Square  of  envelope  is  the  weighting  function. 


EWAIF  and  IWAIF  values  are  highly  correlated. 


IWAIF  in  the  Frequency  Domain 

IWAIF  Of  signal  s(t),  0  <  t  <  T, 

oj  |5(a;)|^  du) 

IWs=— - - 

Io°  |5'('-^')P  dio 

S(u>)  is  the  Fourier  transform  of  s(t). 

This  representation  leads  to 

•  A  fast  and  easy  computation  of  the  IWAIF 
using  FFT  algorithms. 

•  A  more  tractable  model  with  the  filterbank 
introduced. 

Henceforth  IWAIF  of  signals  is  calculated  in  the 
frequency  domain. 


(100()  Hz,  70  (IB)  (1020  Hz,  71  dB) 

Figure  1.  Sampled  signal,  envelope  and  intensity  functions,  and  instantaneous  frequency 
fluctuations  for  a  typical  complimentary  pair  of  Voelcker  tones. 


Comparision  of  IWAIF  and  EWAIF 


signal 

EWIAF 

IWAIF 

(1000  Hz,  71  dB) 

1007.59 

1008.82 

(1020  Hz,  70  dB) 

(1000  Hz,  70  dB) 

1012.41 

1011.18 

(1020  Hz,  71  dB) 

EWIAF/IWAIF  values  for  complementary  Voeicker  signal  pairs 


NUMBER  OF  COMPONENTS 

EWAIF:  broken  line 
IWAIF:  solid  line 


Profile  signals  (Green,  Mason,  Kidd  1984) 
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Multichannel  IWAIF 

ProbiGmi  Given  a  complex  signal,  we  know  howto  compute  it’s  IWAIF.  How  do  we 
get  a  measure  similar  to  the  IWAIF  that  accounts  for  the  filtering  by  the 
basilar  membrane  ? 
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A  Distance  Measure 


Computation  of  the  distance  measure,  D, 
between  s(t)  and  m{t) 


Distance  measure, 
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P(C)  in  2IFC 


Application  to  two  component  complex 

tones 


10  20  50  100  200  500  f’  1000 

FREQUENCY  SEPARATION  (Hz) 


f’:  -75%  estimate 

Distance  measure,  D,  for  two  component 
complex  tones  (Feth  Bi  O’Malley  1977) 


Center 

-75% 

Distance 

frequency 

estimate 

measure 

in  Hz. 

in  Hz. 

D 

250 

223 

0.0635 

500 

364 

0.0543 

1000 

692 

0.0619 

Application  to  Auditory  Profile  Signals 


Distance  measure,  D,  for  profile  signais 
(Green,  Mason,  Si  Kidd  1984) 


D  varies  very  little  indicating  equal  listener 
performance. 


DISTANCE  MEASURE  (D) 


Conclusions 


1 


•  Extension  of  EWAIF  theory  to  wideband  sig- 
i  nals  after  accounting  for  the  auditory  filter- 

j  bank. 

1  •  Definition  of  a  distance  measure  between  two 

^  signals  to  be  discriminated. 

1  •  Results  indicate  the  distance  measure  accu- 

I  rately  reflects  listener  performance  in  discrim- 

I  inatory  tasks  involving  profile  signals  and  two 

component  complex  tones. 

I 

I  •  Future  applications  to  comodulation  mask¬ 
ing  release,  modulation  masking  and  speech 
I  recognition. 
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ABSTRACT 


It  is  often  assumed  that  the  relevant  information  in  formant  transitions  includes 
the  direction,  as  well  as,  the  extent  of  frequency  excursions.  We  have  devised  a 
study  to  determine  the  ability  of  listeners  with  normal  hearing  to  determine  the 
direction  of  linear  frequency  modulation  of  sinusoidal  tones.  To  more  closely 
approximate  the  listener’s  task  in  processing  speech,  music  or  other 
environmentally-important  sounds,  the  initial  frequency  of  each  transition  was 
selected  at  random  from  within  a  pre-defined  range.  Thus,  for  each  interval  of  a 
2Q,  2AFC  listening  task,  the  listener  hears  a  frequency  glide  of  the  same  duration 
and  extent.  In  one  of  the  two  middle  intervals,  the  direction  of  the  transition  is 
reversed  to  produce  the  “signal”.  Since  each  of  the  four  glides  begins  and  ends  on 
frequencies  selected  at  random,  the  listener  cannot  rely  on  simple  pitch  differences 
to  determine  which  interval  contained  the  reversed  glide.  Our  preliminary  results 
will  be  discussed  with  emphasis  on  the  effect  of  the  width  of  the  “roving  frequency” 
range  on  listener  performance.  (Work  supported  by  a  grant  from  AFOSR.) 


INTRODUCTION 


Speech  information  is  conveyed  by  the  dynamics  of  formant  transitions.  The 
extent  and  direction  of  formant  transitions,  especially  the  second  formant,  provide 
acoustic  cues  useful  in  the  identification  of  place  of  articulation  of  consonants. 
Previous  studies  using  frequency-modulated  (FM)  sinusoidal  tones  with  normal¬ 
hearing  listeners,  indicates  that  the  listener  may  use  pitch  cues  derived  from  differing 
endpoint  frequencies  to  determine  the  direction  of  linear  frequency  transitions 
(Nabelek,  Nabelek  and  Hirsh,  1970;  Carlyon  and  Stubbs,  1989).  We  attempted  to 
render  these  pitch  cues  unreliable  by  randomizing  the  starting  frequency  of  each  of 
the  glide  tones  presented  in  a  two-cue,  two  alternative  forced  choice  (2Q,2AFC) 
task.  Our  procedure  is  a  frequency  domain  analog  of  the  roving  level  paradigm  used 
in  the  study  of  intensity  perception  (Berliner  and  Durlach,  1973)  and  in  profile  anal¬ 
ysis  studies  (Green,  1988). 

The  preliminary  work  described  here  was  designed  to  determine  the  bound¬ 
aries  of  the  range  over  which  to  randomize  starting  frequency.  To  determine  the 
range  of  starting  frequencies  required,  listeners  were  asked  to  distinguish  between 
two  linear  FM  glides  differing  only  in  the  direction  of  their  transitions.  Initially,  the 
starting  frequency  (or  ending  frequency,  for  falling  glides)  was  fixed  at  1000  Hz. 
Listener  performance  was  assessed  for  small  frequency  excursions.  The  task  was 
simply  to  indicate  which  of  the  two  center  listening  intervals  contained  a  glide  falling 
in  frequency  when  the  remaining  three  intervals  contained  glides  rising  in  frequen¬ 
cy.  Once  this  base-line  performance  was  established,  the  starting  frequency  was  se¬ 
lected  from  a  uniform  rectangular  distribution  with  a  mean  of  1000  Hz.  With  this 
“roving  frequency”  procedure,  the  extent  of  frequency  excursions  was  increased  to 
again  produce  psychometric  functions. 


PROCEDURES 


SUBJECTS 

Five  listeners  with  normal  hearing  (thresholds  better  than  15  dB  HL  re  ANSI 
1969)  from  250-  to  8000  Hz  participated  in  the  study.  The  age  of  the  listeners 
ranged  from  20  to  24  years.  Prior  to  data  collection,  listeners  were  well-practiced  at 
the  task;  all  except  SI  had  previous  experience  in  psychoacoustic  experiments. 
Subjects  #1  and  #3  have  had  some  musical  training. 


SIGNALS 

Sinusoids  with  linear  frequency  modulation  were  generated  on-line  using  an 
Ariel  DSP-16  signal  processing  board  mounted  in  a  Zenith  159  microcomputer.  All 
signals  were  generated  at  a  100  kHz  sampling  rate,  and  low-pass  filtered  at  8  kHz. 
The  frequency  sweep  for  each  glide  in  a  block  of  50  trials  was  fixed,  but  the  starting 
frequency  could  be  selected  from  a  uniform  rectangular  distribution  centered  on 
1000  Hz.  Signal  duration  was  always  60  msec,  including  5-msec,  rise  and  decay 
times.  Signals  were  either  rising  (UP)  or  falling  (DN)  linear  frequency  glides.  In  the 
roving  frequency  conditions,  the  starting  frequency  for  each  UP  glide  was  selected 
at  random  from  within  a  pre-defined  range;  DN  glides  ended  on  the  selected 
frequency.  The  widths  of  the  frequency  range  examined  were  50-,  100-,  200-  and 
400  Hz.,  centered  on  1000  Hz.  Frequency  excursions  for  the  glides  were  50-,  20-, 
15-,  10-,  5-,  and  2  Hz. 


METHODS 

Listeners  were  seated  in  separate  sound-isolated  booths  facing  a  monitor  and  a 
color  computer  used  for  entering  their  responses.  Three  subjects  could  listen  simul¬ 
taneously.  Signals  were  presented  at  50  dB  SL  through  one  side  of  a  Sennheiser 
HD414SL  headset.  For  each  interval  of  the  2Q,2AFC  task,  the  listener  heard  a  fre¬ 
quency  glide.  The  target  was  a  DN  glide  which  appeared  only  in  one  of  the  two 
middle  listening  intervals  (interval  two  or  three).  Correct  response  feedback  was 
given  after  each  trial.  Data  were  collected  in  blocks  of  50  trials.  All  data  points 
shown  were  derived  from  the  average  of  at  least  six  blocks,  that  is  300  trials. 

Figure  1  illustrates  the  listening  task  for  the  fixed  and  two  “typical”  roving 
starting  frequency  conditions.  In  the  top  row,  each  FM  glide  begins,  or  ends,  at 
1000  Hz.  The  target  is  shown  in  interval  three.  For  the  “roving”  starting  frequency 
conditions,  the  glide  in  each  listening  interval  began  on  a  different  frequency.  In  the 
second  row,  the  range  of  starting  frequencies  covers  50  Hz,  centered  on  1000  Hz. 
The  sweeps  shown  extend  beyond  the  width  of  the  50-Hz  range.  The  target  is 
shown  in  interval  two.  In  the  bottom  row,  the  range  of  starting  frequencies  covers 
400  Hz,  much  wider  than  any  sweep  width  used  in  the  study.  The  target  is  shown 
in  interval  three. 


RESULTS 


Psychometric  functions  for  each  of  the  five  listeners  are  shown  in  Figure  2. 
Percent  of  correct  discriminations  is  plotted  as  a  function  of  the  extent  of  frequency 
sweep  of  the  glide.  Each  panel  shows  results  for  the  fixed  condition  and  for  the  four 
random  starting  frequency  ranges.  The  75%  correct  discrimination  point  is  called 
the  direction  discrimination  threshold.  For  the  fixed  condition,  listeners  are  able  to 
discriminate  UP  from  DN  glides  in  a  5  Hz  frequency  sweep  of  50  msec,  duration, 
with  the  exception  of  S4,  who  requires  approximately  15  Hz  frequency  sweep. 
Introducing  a  random  starting  frequency  degrades  the  discrimination  performance 
for  every  listener;  however,  as  the  width  of  the  range  of  starting  frequency  increases, 
the  psychometric  functions  remain  parallel  for  each  listener. 

For  SI,  the  direction  discrimination  threshold  moves  from  less  than  5  Hz  in 
the  fixed  condition,  to  about  25  Hz  when  the  range  of  starting  frequency  reaches  or 
exceeds  100  Hz.  Sis  psychometric  functions  coincide  for  the  frequency  ranges  of 
100  Hz,  200  Hz,  and  400  Hz.  Performance  for  S2b  and  S3  is  similar  to  SI,  how¬ 
ever  S3s  direction  discrimination  threshold  is  around  30  Hz.  Most  of  the  listeners 
exhibited  a  relatively  long  learning  period.  Except  for  SI  (who  had  not  participated 
in  prior  psychoacoustic  experiments),  each  demonstrated  poor  direction  discrimina¬ 
tion  thresholds  in  both  fixed  and  random  starting  frequency  conditions.  After  exten¬ 
sive  practice,  results  for  S2  and  S3  resemble  those  for  SI.  To  illustrate  this,  early 
performance  of  S2a,  resembles  that  of  S4,  where  the  75%  correct  point  for  the  fixed 
condition  lies  just  above  10  Hz,  and  thresholds  for  the  random  starting  frequency 
conditions  range  from  15  to  100  Hz.  Listeners  S4  and  S5  have  just  begun  to  par¬ 
ticipate  in  the  experiment.  We  expect  that  with  more  practice,  their  performance  will 
be  similar  to  the  initial  three  listeners. 


CONCLUSIONS 


For  the  preliminary  data  shown  here  we  find  that: 

•  Randomizing  the  starting  frequency  of  linear  FM  glides  increases  the  glide 
direction  threshold  approximately  five-fold. 

•  Increasing  the  range  of  the  random  starting  frequency  beyond  100  Hz 
around  1000  Hz  has  no  effect  on  performance. 

•  Most  listeners  require  extensive  practice  to  reach  asymptotic  performance. 


REFERENCES 


Berliner,  J.E.  and  Durlach,  N.L  (1973).  Intensity  Perception.  IV.  Resolution  in  a 
roving-level  discrimination.  J.Acoust.  Soc.  Amer..  53,  1270-1287. 

Carlyon,  R.P.  and  Stubbs,  R.J.  (1989).  Detecting  single-cycle  frequency  modula 
tion  imposed  on  sinusoids,  harmonic  and  inharmonic  carriers.  J.  Acoust. 
Soc.  Amer..  85,  2563-2574. 

Green,  D.M.  (1988).  rlrofile  Analysis:  Auditory  Intensity  Discrimination  New 
York:  Oxford  University  Press,  Inc. 

Nabelek,  I.V,  Nabelek,  A.K.  and  Hirsh,  I.J.  (1970).  Pitch  of  tone  bursts  of 
changing  frequency.  J.  Acoust.  Soc.  Amer..  48,  536-553. 


FIGURE  1.  Glide  Direction  Discrimination  Paradigm. 

Top  -  fixed  starting  frequency.  Middle  -  random  starting  frequency  shown  with 
range  less  than  the  sweep  width.  Bottom  -  random  starting  frequency  with  range 
much  larger  than  the  sweep  width. 


FIGURE  2.  Results  for  Glide  Direction  Discrimination. 

Psychometric  functions  for  five  listeners  with  normal  hearing.  Percent  correct  dis¬ 
crimination  as  a  function  of  frequency  sweep  width  in  2Q,  2AFC.  Results  for  lis¬ 
tener  2  are  shown  for  early  performance  (S2a)  and  after  extensive  practice  (S2b). 
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